layout: true name: inverse class: center, middle, inverse --- # Analyzing Images via Ruby ## Josh Cutler ### 11/24/14 --- ## What do you mean by analyzing? --- layout: false .left-column[ ## What? ] .right-column[ Image analysis can take many forms, but we will be primarily talking about how to glean information about what is in the picture. - OCR (Optical Character Recognition) - Receipt Scanners (Neat, Harvest Expensify) - Menu readers (PickyPint, WineGlass App) - ATM Check deposit - Thanks to Atwood’s law: http://antimatter15.com/ocrad.js/demo.html - Object Recognition (e.g. Haar Classifiers) - Image based search (slyce.it ) - Face recognizers (Facebook, Kinect) - Image de-duping via image hashing ] --- .left-column[ ## What? ## Why? ] .right-column[ Images can be a great input to our applications, however we are limited in what we can do with them unless they are annotated. - Extracting text - Recognizing known objects - Matching against previous images that are already annotated - Lots of other things that I won't cover. ] --- .left-column[ ## What? ## Why? ## How? ] .right-column[ I will show examples of solutions that I have used for [zistle.com](http://www.zistle.com) - Solutions are *very* domain dependent - Every one of these topics is the result of multiple dissertations - The real heavy lifting is happening in C .red[*] - We will leverage open source libraries from Ruby .footnote[.red[*] Haar classifiers can take days to train] ] --- template: inverse ## Ok, give me some specifics. --- ### If you insist. [Zistle.com](http://www.zistle.com) is a website for collectors that helps them manage their collectibles (primarily cards) . It has some interesting data problems: - There are not authoritative lists of what has been manufactored - Many of the manufacturers don't have this information. Many have gone out of business. - There are millions of unique collectible cards - We have addressed this via crowdsourcing. - Many of the items themselves are not clearly labeled, so it isn't clear what item you have - Collecting this data is very labor intensive - We have users that spend 8 hours a day entering data into the site - There are lots of mistakes and debates about the "truth" --- template: inverse ## Labor intensive you say? --- ### Scanners. These horrible yet useful machines. .center.zistle1[![](./scanner.jpg)] --- ### Scanning material .center.zistle1[![](./card-room.jpg)] --- ### Now enter that information into Zistle. .center.zistle1[![](./zistle-add-cards.jpg)] --- ### A few million iterations later... .center.zistle1[![](./zistle-collection.jpg)] --- template: inverse ## Got it. So what did you do? --- --- ## Improving Data Collection It is a common occurence for our users to have already scanned their cards. However, entering them into the site and annotating them can be a lot of work. What if we could use the images to ease this burden? .example-image[![](./shivan dragon.hq.jpg) ![](./test.jpg) ![](./teddy1.jpg) ![](./joe-mauer.jpg) ] --- ## Improving Data Collection - OCR .single-image[![example](./shivan dragon.hq.jpg)] ### We want to get: - “Shivan Dragon” - “Summon Dragon” - “Flying +1/+0” - “While it’s true most Dragons are cruel, the Shivan Dragon seems to take particular glee in the misery of others, often…” --- ## OCR Example We will use the [rtesseract](https://github.com/dannnylo/rtesseract) gem which is a wrapper for [Tesseract](https://code.google.com/p/tesseract-ocr/) library. Here is the simple naive approach ```Ruby # Relies on rtesseract, rmagick require 'rtesseract' image_name = 'shivan dragon.hq.jpg' # Naive approach source = RTesseract.new(image_name) source.to_s * >> " \n\nIlluq. © Melissa Bcmnn\n\n" ``` --- ## OCR - Targeting Luckily we have some domain knowledge. In particular, magic cards have a regular layout. Lets exploit the structure of the card ```Ruby source = Magick::Image.read(image_name).first c = source.crop(50, 425, 390, 180) c.write "jpeg:ocr-body-cropped.jpg" RTesseract.new('ocr-body-cropped.jpg', debug: true).to_s * >> "Flying, 7): +1/+0\n\nW/'hile it’: true most Dragons are\ncruel, the Shivan Dragon seem: to\ntake particular glee in the misery of\nothers, often tormenting its victims\nmuch like a cat plays: with a mouxe\nbefore delivering the final blow.\n\n" ``` .col[*Flying, 7): +1/+0 While it’: true most Dragons are cruel, the Shivan Dragon seem: to take particular glee in the misery of others, often tormenting its victims much like a cat plays: with a mouse before delivering the final blow.*] .col[![](./ocr-body-cropped.jpg)] --- ## OCR - Transforming Can we do this for the title? It is trickier, there is less contrast and a stroke .col75[ ```Ruby source = Magick::Image.read(image_name).first c = source.crop(10, 10, 200, 40) c.write "jpeg:ocr-head-cropped.jpg" ```] .col25[![](./ocr-head-cropped.jpg)] .col75[ ```Ruby t = c.white_threshold 29000 ```] .col25[![](./ocr-head-white.jpg)] .col75[ ```Ruby q = t.quantize 256, Magick::GRAYColorspace ```] .col25[![](./ocr-head-quantized.jpg)] .col75[ ```Ruby n = q.negate ```] .col25[![](./ocr-head-negated.jpg)] .col75[ ```Ruby w = n.white_threshold 28000 ```] .col25[![](./ocr-head-white2.jpg)] --- ## Improving Data Collection - OCR Now lets try running OCR on this image ```Ruby RTesseract.new('ocr-head-white2.jpg', debug: true).to_s * >> "Shivan Dragon\n\n" ``` --- ## OCR - SWT .single-image[![example](./test.jpg)] What if the structure changes? The previous approach is not robust to variable structures. There are algorithms for detecting where text is. Namely [Stroke Width Transform](http://www.cs.cornell.edu/courses/cs4670/2010fa/projects/final/results/group_of_arp86_sk2357/Writeup.pdf). They have issues (AKA they haven’t worked well for me yet). Open Source library called [DetectText](https://github.com/aperrau/DetectText) --- ## OCR - SWT Decomposition .col[.swt[![](./canny.png)]] .col[.swt[Canny edge detection]] .clear[ ] .col[.swt[![](./SWT.png)]] .col[.swt[Stroke Width Detection]] .col[.swt[![](./components.png)]] .col[.swt[Connected Components]] .col[.swt[![](./detect_text_bottom_b.jpg)]] .col[.swt[Final Output]] --- ## OCR - SWT Code Just use the command line tools to create the outputs in this case ```Ruby %x(./DetectText '#{Dir.pwd}/ocr-general-top.jpg' '#{out_path}' 1) rtess = RTesseract.new(out_path) rtess.to_s ``` Note that this algorithm can detect white or black text but can only do one at a time. Run twice if you do not know the color of the text. --- template: inverse ## Ok. But what if the information we need isn't text? --- ## Object Detection There are lots of ways to do object detection. For Zistle we wanted to automatically label cards by team. - We have a finite set of relatively static objects: Logos - We will use [Cascade Classifiers](http://docs.opencv.org/doc/user_guide/ug_traincascade.html) - Supervised learning technique - There is an implementation in the [OpenCV](http://docs.opencv.org/) library .center[![](http://upload.wikimedia.org/wikipedia/commons/thumb/3/32/OpenCV_Logo_with_text_svg_version.svg/97px-OpenCV_Logo_with_text_svg_version.svg.png)] --- ## Cascade Classifiers Find Haar like Features (similar to Haar wavelets) .red[*] We reduce the feature space by comparing rectangles - A simple rectangular Haar-like feature can be defined as the difference of the sum of pixels of areas inside the rectangle - Use [Adaboost](http://en.wikipedia.org/wiki/AdaBoost) to select the best features .center[![](haar.png)] .footnote[.red[*] Viola and Jones, "Rapid object detection using a boosted cascade of simple features", Computer Vision and Pattern Recognition, 2001] --- ## Classifier - Training There are 2 stages to their user: training and detection We need positive and negative training samples - Positive example will just be the twins logo - Negative examples are anything else We will use _opencv_createsamples_ to generate more via random rotating, changing the logo intensity as well as placing the logo on arbitrary background .center.twins-logo[![](twins-logo2.jpg)] --- ## Classifier - Training Ctd Create folder of negative and a folder of positive examples ```bash opencv_createsamples -vec positive-samples.vec -img positive/twins-logo.jpeg \ -bgcolor 34 ``` ```bash opencv_traincascade -data twins-classifier -vec positive-samples.vec \ -bg negative.txt -precalcValBufSize 2048 -precalcIdxBufSize 2048 \ -numPos 200 -numNeg 900 -numStages 10 -minhitrate 0.999 -maxfalsealarm 0.5 \ -w 50 -h 50 -nonsym -baseFormatSave ``` Now wait 1-5 days. Hopefully you selected the right parameters and haven't over/undertrained. .red[*] .footnote[.red[*] There is a lot of literature on how to deal with this. But, it is a large topic and domain dependent.] --- ## Classifier - Detection Once we have a classifier that is well calibrated, detection is easy ```Ruby cascade_file = "#{Dir.pwd}/training/twins-classifier2/cascade.xml" detector = CvHaarClassifierCascade.load(cascade_file) examples = ['test1.jpg', 'test2.jpg', 'test3.jpg'] examples.each do |file_name| image = CvMat.load(file_name) detector.detect_objects(image).each do |region| color = CvColor::Green image.rectangle! region.top_left, region.bottom_right, color: color end image.save_image("processed_#{file_name}") end ``` --- ## Classifier - Detection .col33[![](./processed_test1.jpg)] .col33[![](./processed_test2.jpg)] .col33[![](./processed_test3.jpg)] --- template: inverse ## A quick recap --- ## A few random things There are a ton of other things you can do in this space. At Zistle we are working on a system to allow users to snap a photo of a card and tell them which one it is. - It isn't fully baked yet - It is a hefty enough topic that it is probably a seperate talk. Check out these resources for more info: - http://www.computervisionmodels.com/ - http://szeliski.org/Book/ - http://docs.opencv.org/ Sometimes it is cheaper / easier to just send things to mechanical turk! --- ## What did I just see? I showed you a few of techniques to automate the process of labeling images. - You can use OCR to detect text. - You might need to clean it up - An accuracy rate of 95% still neans that you get a Iot of characters wrng. - If you have a finite set of objects that you wish to detect, you can train classifiers to find them - You will need training data - There is a non-trivial amount of up front work --- template: inverse ## Questions / Comments? ### josh@cutl3r.com ### @josh_cutler --- ## Tesseract Algorithm READING INPUT - Lines are read in from scanned image, in edge detection EDGE DETECTION/OUTLINES - Black pixels are split into blobs, aka edge detection - Blobs are processed to extract outlines, in edge detection LINES/SKEW - Lines are derived from strings of blobs with outlines - Gradient/rotation of page is calculated - Lines are adjusted for skew --- ## Tesseract Algorithm Ctd. WORDS/SEGMENTER - Higher-level procedure to order blobs into words - Blobs in lines are segmented into words CLASSIFICATION - Classification of features in letters of all words performed - Words are checked in dictionary and permuter to improve them - Play with xht (height of letter 'x') for words - Words are fitted to lines and assigned to rows that fit them best