Pc Seminar Jordi

134
Vi l Obj R ii Visual Object R ecognition Perceptual Computing Seminar Perceptual Computing Seminar Sergio Escalera, Xavier Baró, Jordi Vitrià, Petia Radeva, Oriol Pujol BCN Perceptual Computing Lab

description

 

Transcript of Pc Seminar Jordi

Page 1: Pc Seminar Jordi

Vi l Obj R i iVisual Object RecognitionPerceptual Computing SeminarPerceptual Computing Seminar

Sergio Escalera,  Xavier Baró, Jordi Vitrià, Petia Radeva, Oriol PujolBCN Perceptual Computing Lab

Page 2: Pc Seminar Jordi

Index

1. Introduction

2. Recognition with Local Features: Basics. 

3 I i i SIFT3. Invariant representations: SIFT

4. Recognition as a Classification Problem: gFERNS

5 Very large databases Hashing5. Very large databases: Hashing

Visual Object Recognition                 Perceptual Computing Seminar                        Page 2

Page 3: Pc Seminar Jordi

Introduction

The recognition of object categories in imagesThe recognition of object categories in imagesis one of the most challenging problems incomputer vision especially when the numbercomputer vision, especially when the numberof categories is large.

Humans are able to recognize thousands ofobject types, whereas most of the existingobject recognition systems are trained toj g yrecognize only a few.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 3

Page 4: Pc Seminar Jordi

Introduction

I i t i i t ill i ti “ h ” l l t t t

Visual Object Recognition                 Perceptual Computing Seminar                        Page 4

Invariance to viewpoint, illumination, “shape”, color, scale, texture, etc.

Page 5: Pc Seminar Jordi

Introduction

Why do we care about recognition? (theoretical question)y g ( q )

Perception of function: We can perceive thep p3D shape, texture, material properties,without knowing about objects But thewithout knowing about objects. But, theconcept of category encapsulates alsoi f ti b t h t d ithinformation about what can we do withthose objects.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 5

Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.

Page 6: Pc Seminar Jordi

Introduction

Why it is hard?yFind the chair in this image Output of correlation

This is a chair

Visual Object Recognition                 Perceptual Computing Seminar                        Page 6

Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.

Page 7: Pc Seminar Jordi

Introduction

Why it is hard?y

P tt h b Si l t l tFind the chair in this image  Pretty much garbage; Simple template matching is not going to make it

Visual Object Recognition                 Perceptual Computing Seminar                        Page 7

Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, September 24.

Page 8: Pc Seminar Jordi

IntroductionWhy do we care about recognition? (practical question)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 8

Page 9: Pc Seminar Jordi

IntroductionWhy do we care about recognition? (practical question)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 9

Page 10: Pc Seminar Jordi

IntroductionWhy do we care about recognition (practical question)?

Query Results from 5k Flickr images (demo available for 100k set)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 10

James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, Andrew Zisserman: Object retrieval with large vocabularies and fast spatial matching. CVPR 2007

Page 11: Pc Seminar Jordi

Recognition with Local Featuresg

It is known that the visual system can use local,informative image «fragments» of a givenobject, rather than the whole object, toj , j ,classify it into a familiar category.

This approach has some advantages over holisticmethodsmethods...

Visual Object Recognition                 Perceptual Computing Seminar                        Page 11

Page 12: Pc Seminar Jordi

Recognition with Local Featuresg

Holistic Fragment‐based

Visual Object Recognition                 Perceptual Computing Seminar                        Page 12

g

Page 13: Pc Seminar Jordi

Recognition with Local Featuresg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 13

Jay Hegde, Evgeniy Bart, and Daniel Kersten, "Fragment‐based learning of visual object categories", CurrentBiology, 2008.

Page 14: Pc Seminar Jordi

Recognition with Local FeaturesgThe most basic approach is called the “bag ofwords” approach (it as inspired inwords” approach (it was inspired intechniques used by the natural languageprocessing community).

Visual Object Recognition                 Perceptual Computing Seminar                        Page 14

Page 15: Pc Seminar Jordi

Recognition with Local FeaturesgAssumptions:

d d f Fragments• Independent features.

• Histogram representation.

Fragments vocabulary

(generic/class‐based etc )based, etc.)

ImageImage =

Fragments histogramhistogram

Visual Object Recognition                 Perceptual Computing Seminar                        Page 15

Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.

Page 16: Pc Seminar Jordi

Recognition with Local FeaturesgA more advanced approach involves several stepssteps:

• Stage 0: Find image locations where we canreliably find correspondences with other images.

• Stage 1: Image content is transformed into localg gfeatures (that are invariant to translation,rotation, and scale).

• Stage 2: Verify if they belong to a consistentconfigurationconfiguration

Visual Object Recognition                 Perceptual Computing Seminar                        Page 16Slide credit: David Lowe

Page 17: Pc Seminar Jordi

SIFTA wonderful example of these stages can be found inDavid Lowe’s (2004) “Distinctive image features fromDavid Lowe s (2004) Distinctive image features fromscale‐invariant keypoints” paper, which describes thedevelopment and refinement of his Scale Invariantdevelopment and refinement of his Scale InvariantFeature Transform (SIFT).

L l F t SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 17

Local Features, e.g. SIFT

Page 18: Pc Seminar Jordi

Recognition with Local FeaturesgWhich local features?

?

Visual Object Recognition                 Perceptual Computing Seminar                        Page 18Slide credit: A. Efros

Page 19: Pc Seminar Jordi

SIFTStage 0: How can we find image locations where we can reliably findcorrespondences with other images?

A “good” location has one stable sharp extremum.

f ff Good !

f

x

bad

x

bad

xx x x

Visual Object Recognition                 Perceptual Computing Seminar                        Page 19

Page 20: Pc Seminar Jordi

SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 20

Page 21: Pc Seminar Jordi

SIFTStage 0: How can we find image locations where we can reliably findcorrespondences with other images?

How to compute extrema at a given scale:

1) We apply a Gaussian filter:

2) We compute a difference‐of‐Gaussians

3) We look for 3D extrema in the resulting structure. 

Visual Object Recognition                 Perceptual Computing Seminar                        Page 21

Page 22: Pc Seminar Jordi

SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 22

Page 23: Pc Seminar Jordi

SIFTThese features are invariant to location and scale

Visual Object Recognition                 Perceptual Computing Seminar                        Page 23

Page 24: Pc Seminar Jordi

SIFTStage 1: Image content is transformed into local features (that are invariantto translation, rotation, and scale).

In addition to dealing with scale changes, we need todeal with (at least) in‐plane image rotation.

One way to deal with this problem is to designdescriptors that are rotationally invariant, but suchdescriptors have poor discriminability, i.e. they mapdifferent looking patches to the same descriptor.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 24

Page 25: Pc Seminar Jordi

SIFT

A better method is to estimate a dominantA better method is to estimate a dominantorientation at each detected keypoint.

1.Calculate histogram of local gradients in the window

2.Take the dominant orientation gradient as “up”

3.Rotate local area for computing descriptor

Visual Object Recognition                 Perceptual Computing Seminar                        Page 25

Page 26: Pc Seminar Jordi

SIFTLowe:

• computes a 36‐bin histogram of edge orientationsweighted by both gradient magnitude and Gaussiandistance to the center,

• finds all peaks within 80% of the global maximum,and then

• computes a more accurate orientation estimateusing a 3‐bin parabolic fit.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 26

Page 27: Pc Seminar Jordi

SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 27

Page 28: Pc Seminar Jordi

SIFT

Local patch around descriptor from Gaussian pyramid

Gradient magnitude Gradient orientationfrom Gaussian pyramid

Visual Object Recognition                 Perceptual Computing Seminar                        Page 28

Page 29: Pc Seminar Jordi

SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 29

Page 30: Pc Seminar Jordi

SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 30

Page 31: Pc Seminar Jordi

SIFTEven after compensating for translation,rotation and scale changes the localrotation, and scale changes, the localappearance of image patches will usually stillvary from image to image.

How can we make the descriptor that we matchmore invariant to such changes while stillmore invariant to such changes, while stillpreserving discriminability between different(non corresponding) patches?(non‐corresponding) patches?

Visual Object Recognition                 Perceptual Computing Seminar                        Page 31

Page 32: Pc Seminar Jordi

SIFTSIFT features are formed by computing the gradient at

h l d d h d deach pixel in a 16x16 window around the detectedkeypoint, using the appropriate level of the Gaussian

id hi h h k i d dpyramid at which the keypoint was detected.

Th di t it d d i ht d b G i f ll ff f tiThe gradient magnitudes are downweighted by a Gaussian fall‐off functionin order to reduce the influence of gradients far from the center, as theseare more affected by small misregistrations.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 32

Page 33: Pc Seminar Jordi

SIFTIn each 4x4 quadrant, a gradient orientationhistogram is formed b (concept all ) addinghistogram is formed by (conceptually) addingthe weighted gradient value to one of 8orientation histogram bins.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 33

Page 34: Pc Seminar Jordi

SIFT

The resulting 128 non negative values form aThe resulting 128 non‐negative values form araw version of the SIFT descriptor vector.

To reduce the effects of contrast/gain (additivevariations are already removed by thegradient), the 128‐D vector is normalized togradient), the 128 D vector is normalized tounit length.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 34

Page 35: Pc Seminar Jordi

SIFTOnce we have extracted features and their descriptorsfrom two or more images the next step is to establishfrom two or more images, the next step is to establishsome preliminary feature matches between theseimagesimages.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 35

Page 36: Pc Seminar Jordi

SIFTOnce we have extracted features and their descriptorsfrom two or more images the next step is to establishfrom two or more images, the next step is to establishsome preliminary feature matches between theseimagesimages.

SIFT uses a nearest neighbor classifier with a distance ratiomatching criterion We can define this nearest neighbormatching criterion. We can define this nearest neighbordistance ratio as

where d1 and d2 are the nearest and second nearest neighbordistances, and DA…..DC are the target descriptor along with itsclosest two neighbors

Visual Object Recognition                 Perceptual Computing Seminar                        Page 36

closest two neighbors.

Page 37: Pc Seminar Jordi

SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 37

Page 38: Pc Seminar Jordi

SIFT

Linear method:

The simplest way to find all correspondingfeature points is to compare all featuresagainst all other features in each pair ofpotentially matching images.

f l h d hUnfortunately, this is quadratic in thenumber of extracted features, which makes itimpractical for some applications.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 38

Page 39: Pc Seminar Jordi

SIFT

Nearest‐neighbor matching is the majorNearest‐neighbor matching is the majorcomputational bottleneck:

• Linear search performs dn2 operations for nfeature points and d dimensionsfeature points and d dimensions• No exact NN methods are faster than linearsearch for d>10search for d>10• Approximate methods can be much faster, butat the cost of missing some correct matchesat the cost of missing some correct matches.Failure rate gets worse for large datasets.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 39

Page 40: Pc Seminar Jordi

SIFT

A better approach is to devise an indexing structureA better approach is to devise an indexing structuresuch as a multi‐dimensional search tree or a hashtable to rapidly search for features near a giventable to rapidly search for features near a givenfeature.

For extremely large databases (millions of images ormore), even more efficient structures based onmore), even more efficient structures based onideas from document retrieval (e.g., vocabularytrees) can be used.trees) can be used.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 40

Page 41: Pc Seminar Jordi

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

The first step is to establish a set of putativeThe first step is to establish a set of putativecorrespondences.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 41

Page 42: Pc Seminar Jordi

SIFT

How can we discard erroneous correspondences?

Visual Object Recognition                 Perceptual Computing Seminar                        Page 42

Page 43: Pc Seminar Jordi

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

Once we have some hypothetical (putative)Once we have some hypothetical (putative)matches, we can use geometric alignmentt if hi h t h i li dto verify which matches are inliers andwhich ones are outliers.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 43

Page 44: Pc Seminar Jordi

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

• Extract features

• Compute putative matches

Visual Object Recognition                 Perceptual Computing Seminar                        Page 44

Page 45: Pc Seminar Jordi

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

• Loop:– Hypothesize transformation T (using a small group of putative 

matches that are related by T)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 45

matches that are related by T)

Page 46: Pc Seminar Jordi

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

• Loop:• Loop:– Hypothesize transformation T (small group of putative matches that 

are related by T)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 46

– Verify transformation (search for other matches consistent with T)

Page 47: Pc Seminar Jordi

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 47

Page 48: Pc Seminar Jordi

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

2D transformation models:2D transformation models:• Similarity

(translation,(translation, scale, rotation)

• Affine

• Projective(homography)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 48

Page 49: Pc Seminar Jordi

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):

),( ii yx ),( ii yx

Visual Object Recognition                 Perceptual Computing Seminar                        Page 49Slide credit: S. Lazebnik

Page 50: Pc Seminar Jordi

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):

t

m

m

2

1

0100

2

1

43

21

tt

yx

mmmm

yx

i

i

i

i

i

i

ii

ii

yx

tmm

yxyx

4

3

10000100

tt

2

1

Visual Object Recognition                 Perceptual Computing Seminar                        Page 50Slide credit: S. Lazebnik

Page 51: Pc Seminar Jordi

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):

• Linear system with six unknowns

• Each match gives us two linearly independent equations: d l h l f h fneed at least three to solve for the transformation 

parameters

C l A b i d i• Can solve Ax=b using pseduo‐inverse:

x = (ATA)‐1ATb      

Visual Object Recognition                 Perceptual Computing Seminar                        Page 51Slide credit: S. Lazebnik

Page 52: Pc Seminar Jordi

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):

• Linear system with six unknowns

• Each match gives us two linearly independent equations: d l h l f h fneed at least three to solve for the transformation 

parameters

C l A b i d i• Can solve Ax=b using pseduo‐inverse:

x = (ATA)‐1ATb      

Visual Object Recognition                 Perceptual Computing Seminar                        Page 52Slide credit: S. Lazebnik

Page 53: Pc Seminar Jordi

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

The process of selecting a small set of seedmatches and then verifying a larger set isy g goften called random sampling or RANSAC.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 53

Page 54: Pc Seminar Jordi

RANSACRANSAC was originally formulated in Martin A. Fischler and Robert C. Bolles (June

1981). "Random Sample Consensus: A Paradigm for Model Fitting withApplications to Image Analysis and Automated Cartography". Comm. of thepp g y g p yACM 24: 381–395.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 54

Page 55: Pc Seminar Jordi

RANSAC“We approached the fitting problem in the opposite way from most previoustechniques. Instead of averaging all the measurements and then trying tothrow out bad ones we used the smallest number of measurements tothrow out bad ones, we used the smallest number of measurements tocompute a model’s unknown parameters and then evaluated theinstantiated model by counting the number of consistent samples”

Visual Object Recognition                 Perceptual Computing Seminar                        Page 55From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.

Page 56: Pc Seminar Jordi

RANSAC

It’s easy to understand and it’s effective

• It helps solve a common problem (i.e., filter out gross errorsintroduced by automatic techniques)introduced by automatic techniques)

• The number of trials to “guarantee” a high level of success(e g 99 99 probability) is surprisingly small(e.g., 99.99 probability) is surprisingly small

• The dramatic increase in computation speed made it possibleto do a large number of trials (100s or 1000s)

• The algorithm can stop as soon as a good match is computedThe algorithm can stop as soon as a good match is computed(unlike Hough techniques that typically compute a largenumber of examples and then identify matches)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 56From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.

Page 57: Pc Seminar Jordi

RANSACThe basic idea is to repeat M times the following process:1. A model is fitted to the hypothetical inliers, i.e. all free parameters of theyp , pmodel are reconstructed from the data set.

2. All other data are then tested against the fitted model and, if a point fitswell to the estimated model also considered as a hypothetical inlierwell to the estimated model, also considered as a hypothetical inlier.

3. The estimated model is reasonably good if sufficiently many points havebeen classified as hypothetical inliers.

4. The model is reestimated from all hypothetical inliers, because it has onlybeen estimated from the initial set of hypothetical inliers.

5 Finally the model is evaluated by estimating the error of the inliers relative5. Finally, the model is evaluated by estimating the error of the inliers relativeto the model.

This procedure is repeated a fixed number of times, each time producingeither a model which is rejected because too few points are classified as inliersor a refined model together with a corresponding error measure. In the lattercase, we keep the refined model if its error is lower than the last saved model.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 57From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.

, p

Page 58: Pc Seminar Jordi

RANSAC

Visual Object Recognition                 Perceptual Computing Seminar                        Page 58

Page 59: Pc Seminar Jordi

RANSAC

Line fitting example:Line fitting example:

Task:Estimate best line

Visual Object Recognition                 Perceptual Computing Seminar                        Page 59

st ate best e

Page 60: Pc Seminar Jordi

RANSAC

Line fitting example:Line fitting example:

Sample two points

Visual Object Recognition                 Perceptual Computing Seminar                        Page 60

Page 61: Pc Seminar Jordi

RANSAC

Line fitting example:Line fitting example:

Fit Line

Visual Object Recognition                 Perceptual Computing Seminar                        Page 61

Page 62: Pc Seminar Jordi

RANSAC

Line fitting example:Line fitting example:

Total number of points within a threshold of line.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 62

Page 63: Pc Seminar Jordi

RANSAC

Line fitting example:Line fitting example:

Repeat, until get a good result

Visual Object Recognition                 Perceptual Computing Seminar                        Page 63

good esu t

Page 64: Pc Seminar Jordi

RANSAC

Line fitting example:Line fitting example:

Repeat, until get a good result

Visual Object Recognition                 Perceptual Computing Seminar                        Page 64

good esu t

Page 65: Pc Seminar Jordi

RANSAC

Visual Object Recognition                 Perceptual Computing Seminar                        Page 65

Page 66: Pc Seminar Jordi

RANSAC example: translationp

Putative matches

Visual Object Recognition                 Perceptual Computing Seminar                        Page 66Slide credit: A. Efros

Page 67: Pc Seminar Jordi

RANSAC example: translationp

Select onematch, count inliers

Visual Object Recognition                 Perceptual Computing Seminar                        Page 67Slide credit: A. Efros

Page 68: Pc Seminar Jordi

RANSAC example: translationp

Find “average” translation vector

Visual Object Recognition                 Perceptual Computing Seminar                        Page 68Slide credit: A. Efros

Page 69: Pc Seminar Jordi

RANSACInterest points( / )(500/image)

Putative correspondences (268)(268)

Outliers (117)

Inliers (151)

Final inliers (262)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 69

Page 70: Pc Seminar Jordi

SIFT Applicationspp

Visual Object Recognition                 Perceptual Computing Seminar                        Page 70

Page 71: Pc Seminar Jordi

SIFT Applicationspp

Visual Object Recognition                 Perceptual Computing Seminar                        Page 71

Page 72: Pc Seminar Jordi

SIFT Applicationspp

Visual Object Recognition                 Perceptual Computing Seminar                        Page 72

HDRSoft

Page 73: Pc Seminar Jordi

SIFT Applicationspp

Visual Object Recognition                 Perceptual Computing Seminar                        Page 73

Page 74: Pc Seminar Jordi

Matching and Classificationg

SIFT allows reliable real‐time recognition butat a computational cost that severely limitsthe number of points that can be handled.

A standard implementation requires 1 ms perfeature point which limits the number offeature point, which limits the number offeature points to 50 per frame if one‐requires frame rate performancerequires frame‐rate performance.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 74

Page 75: Pc Seminar Jordi

Matching and Classificationg

An alternative is to rely on statistical learningtechniques to model the set of possibleappearances of a patch.

The major challenge is to use simple modelsto allow for real time efficient recognitionto allow for real‐time, efficient recognition.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 75

Page 76: Pc Seminar Jordi

Matching and Classificationg

Can we match keypoints using simplerfeatures without intensive preprocessing?

{ }? : { … }We will assume that we have the possibilityp yto train a classifier for each keypoint class.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 76

Page 77: Pc Seminar Jordi

Matching and ClassificationgSimple binary features I(mi,1)

I( )I(mi,2)

The test compares the intensities of twopixels around the keypoint:pixels around the keypoint:

)I(m)if I(m ii1 21

otherwise

)I(m)if I(mf i,i,

i 01 21

Visual Object Recognition                 Perceptual Computing Seminar                        Page 77

Page 78: Pc Seminar Jordi

Matching and ClassificationgWithout intensive preprocessing

We can synthetically generate the set ofkeypoint’s possible appearances undervarious perspective, lighting, noise, etc.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 78

Page 79: Pc Seminar Jordi

Matching and ClassificationgFERN Formulation

We model the class conditional probabilitiesof a large number of binary features whichare estimated by a training phase.y g p

At run time, these probabilities are used toAt run time, these probabilities are used toselect the best match for a given imagepatchpatch.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 79

Page 80: Pc Seminar Jordi

Matching and ClassificationgFERN Formulation

fi : Binary feature.

Nf : Total number of features in the model.

Ck : Class representing all views of an image patcharound a keypoint.

Given f1 ,..., f Nf select the class k such that

)|()|( CfffPfffCPk )|,,,(maxarg),,,|(maxarg 2121 kNk

Nkk

CfffPfffCPkff

Visual Object Recognition                 Perceptual Computing Seminar                        Page 80

Mustafa Ozuysal, Michael Calonder, Vincent Lepetit, Pascal Fua, "Fast Keypoint Recognition Using RandomFerns," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 99, , 2009

Page 81: Pc Seminar Jordi

Matching and ClassificationgFERN Formulation

However, it is not practical to model the jointdistribution of all features. We group featuresinto small sets (fern) and assume independencebetween these sets (Semi‐Naïve BayesianClassifier):

Fj : A fern is defined to be the set of S binaryfeatures {f f +S }.features {fr ,..., fr+S }.

M is the number of ferns, Nf = S X M.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 81

Page 82: Pc Seminar Jordi

Matching and ClassificationgFERN Formulation

NkN CfffP f

f21 !parameters2)|,,,(

fkikN

kN

NCfPCfffP

ffffN

f

21

21

,parameters)|()|,,,(

p)|,,,(

fi

kikN fffff

121

simple. but too

,p)|()|,,,(

M

j

skjkN MCFPCfffP

f1

21 .parameters 2)|()|,,,( j 1

Visual Object Recognition                 Perceptual Computing Seminar                        Page 82

Page 83: Pc Seminar Jordi

Matching and ClassificationgFERN Implementation

We generate a random set of binary features.A binary feature outputs a binary number

y p y

possibilities

8ibili ipossibilities

A fern with S nodes outputs a number between o and 2S‐1

Visual Object Recognition                 Perceptual Computing Seminar                        Page 83

A fern with S nodes outputs a number between o and 2 ‐1.

Page 84: Pc Seminar Jordi

Matching and ClassificationgFERN Implementation

When we have multiple patches of the sameclass we can model the output of a fern witha multinomial distribution.

Probability for each possibility.a multinomial distribution. possibility.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 84

Page 85: Pc Seminar Jordi

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 85Slide Credit: V.Lepetit

Page 86: Pc Seminar Jordi

Matching and Classificationg

0

1

1

6

Visual Object Recognition                 Perceptual Computing Seminar                        Page 86Slide Credit: V.Lepetit

Page 87: Pc Seminar Jordi

Matching and Classificationg

10

01

01

1

6

Visual Object Recognition                 Perceptual Computing Seminar                        Page 87Slide Credit: V.Lepetit

Page 88: Pc Seminar Jordi

Matching and Classificationg

110

001

101

1

65

Visual Object Recognition                 Perceptual Computing Seminar                        Page 88Slide Credit: V.Lepetit

Page 89: Pc Seminar Jordi

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 89Slide Credit: V.Lepetit

Page 90: Pc Seminar Jordi

Matching and Classificationg

N liNormalize:P ( f1, f 2 , , f n | C c i )

000001

1

001

111

Visual Object Recognition                 Perceptual Computing Seminar                        Page 90Slide Credit: V.Lepetit

Page 91: Pc Seminar Jordi

Matching and ClassificationgFERN Implementation

At the end of the training we haveAt the end of the training we havedistributions over possible fern outputs foreach classeach class.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 91

Page 92: Pc Seminar Jordi

Matching and ClassificationgFERN Implementation

To recognize a new patch the outputs selectsTo recognize a new patch the outputs selectsrows of distributions for each fern and theseare then combined assuming independenceare then combined assuming independencebetween distributions.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 92

Page 93: Pc Seminar Jordi

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 93

Page 94: Pc Seminar Jordi

Matching and ClassificationgFERN Implementation

…in 10 lines of code….

1: for(int i = 0; i < H; i++) P[i ] = 0.;2: for(int k = 0; k < M; k++) {3: int index = 0, * d = D + k * 2 * S;4: for(int j = 0; j < S; j++) {5: index <<= 1;6: if (*(K + d[0]) < *(K + d[1]))7: index++;8: d += 2;

}9: p = PF + k * shift2 + index * shift1;10: for(int i = 0; i < H; i++) P[i] += p[i];

}

Visual Object Recognition                 Perceptual Computing Seminar                        Page 94

Page 95: Pc Seminar Jordi

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 95

Page 96: Pc Seminar Jordi

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 96

Page 97: Pc Seminar Jordi

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 97

Page 98: Pc Seminar Jordi

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 98

Page 99: Pc Seminar Jordi

Matching and Classificationg

The FERN technique speeds‐up keypointmatching but the training is slow andperformed offline.

Hence, it is not suited for applications thatrequire real‐time online learning orrequire real time online learning orincremental addition of arbitrary numbersof keypoints (f e SLAM)of keypoints (f.e. SLAM).

Visual Object Recognition                 Perceptual Computing Seminar                        Page 99

Page 100: Pc Seminar Jordi

Matching and Classificationg

This limitation can be removed if we train aFERN classifier to recognize a number ofkeypoints extracted from a referencedatabase and all other keypoints aredatabase and all other keypoints arecharacterized in terms of their response tothese classification ferns (signature)these classification ferns (signature).

Visual Object Recognition                 Perceptual Computing Seminar                        Page 100

Page 101: Pc Seminar Jordi

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 101

M. Calonder, V. Lepetit, and P. Fua, Keypoint Signatures for Fast Learning and Recognition. In Proceedings of European Conference on Computer Vision, 2008.

Page 102: Pc Seminar Jordi

Matching and Classificationg

It can be empirically shown that theseIt can be empirically shown that thesesignatures are stable under changes inviewing conditionsviewing conditions.

Signatures are sparse in nature if we apply aSignatures are sparse in nature if we apply athreshold function.

Signatures do not need a training phase andscale well with the number of classesscale well with the number of classes(nearest neighbor).

Visual Object Recognition                 Perceptual Computing Seminar                        Page 102

Page 103: Pc Seminar Jordi

Matching and Classificationg

However, matching signatures still involvesHowever, matching signatures still involvesmany more elementary operations thanabsolutely necessaryabsolutely necessary.

M l i h i iMoreover, evaluating the signatures requiresstoring many distributions of the same size asthemselves and, therefore, large amounts ofmemory.y

Visual Object Recognition                 Perceptual Computing Seminar                        Page 103

Page 104: Pc Seminar Jordi

Matching and Classificationg

The full response vector r(p) for all J Ferns is takenp (p)to be: Vectors storing the 

probability that p is one of the N reference points

where Z is a normalizer s.t. its elements sum to one.

the N reference points.

In practice, when p truly corresponds to one of thereference keypoints r(p) contains one element that is closereference keypoints, r(p) contains one element that is closeto one where all others are close to zero.

Otherwise it contains a few relatively large values thatOtherwise, it contains a few relatively large values thatcorrespond to reference keypoints that are similar inappearance and small values elsewhere.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 104

pp

Page 105: Pc Seminar Jordi

Matching and Classificationg

We can compute a sparse signature by applting ap p g y pp gpoint wise threshold function with a θ value.

It is an N‐dimensional vector with only a few non‐yzero elements that is mostly invariant to differentimaging conditions and therefore presents a usefulg g pdescriptor for matching purposes.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 105

Page 106: Pc Seminar Jordi

Matching and ClassificationgThe patch

J Ferns

Vectors storingVectors storing the probability that p is one of the N reference points.

Typical parameters: J=50; d=10; N=500

Visual Object Recognition                 Perceptual Computing Seminar                        Page 106

J 50; d 10; N 500

Page 107: Pc Seminar Jordi

Matching and Classificationg

Typical parameters: J=50; d=10; N=500J 50; d 10; N 500

We need for each of the 2d leaves in each of the J Ferns an N‐dimensional vector of floatsdimensional vector of floats.

The total memory requirement is M=bJ2d N bytes, where b is thenumber of bytes to store a float (8) In practice 100MB!

Visual Object Recognition                 Perceptual Computing Seminar                        Page 107

number of bytes to store a float (8). In practice, 100MB!

Page 108: Pc Seminar Jordi

Matching and Classificationg

Compressive Sensing literature:Compressive Sensing literature:

• High‐dimensional sparse vectors can beg preconstructed from their linear projections intomuch lower‐dimensional spaces.p

• The Johnson–Lindenstrauss lemma states that all f h h d lsmall set of points in a high‐dimensional space can

be embedded into a space of much lowerdi i i h h di bdimension in such a way that distances betweenthe points are nearly preserved.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 108

Page 109: Pc Seminar Jordi

Matching and Classificationg

Many kinds of matrices can be used for thisMany kinds of matrices can be used for thispurpouse.

Random Ortho‐Projection (ROP) matricesare a good choice and can be easilyconstructed by applying a Gram‐Schmidty pp y gorthonormalization process to a randommatrixmatrix.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 109

Page 110: Pc Seminar Jordi

Matching and Classificationg

I th ti th G S h idt iIn mathematics the Gram–Schmidt process is amethod for orthonormalizing a set of vectors in

i d t t lan inner product space, most commonlythe Euclidean space Rn.

The Gram–Schmidt process takes a finite, linearlyi d d t t S { } f k ≤ dindependent set S = {v1, …, vk} for k ≤ n andgenerates an orthogonal set S' = {u1, …, uk} that

th k di i l b f Rn Sspans the same k‐dimensional subspace of Rn as S.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 110

Page 111: Pc Seminar Jordi

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 111

M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐speed Interest Point Description and Matching. In Proceedings of International Conference on Computer Vision, 2009.

Page 112: Pc Seminar Jordi

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 112

M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐speed Interest Point Description and Matching. In Proceedings of International Conference on Computer Vision, 2009.

Page 113: Pc Seminar Jordi

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 113

M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐speed Interest Point Description and Matching. In Proceedings of International Conference on Computer Vision, 2009.

Page 114: Pc Seminar Jordi

Matching and Classificationg

This approach reduces the memory requirement whenstoring the models: for N=512, M=176, therequirements change from 93.75MB to 175B!The CPU time is 6.3ms per an exhaustive NN matchingof 256 points (256x256)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 114

of 256 points (256x256).

Page 115: Pc Seminar Jordi

Internet‐scale image databasesg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 115

Page 116: Pc Seminar Jordi

Min HASH

How can we find similar images inHow can we find similar images in very large datasets? 

Can we get clusters from thesegimages?

Visual Object Recognition                 Perceptual Computing Seminar                        Page 116

Page 117: Pc Seminar Jordi

Min HASH

Let’s suppose that we choose a LARGE bag‐Let s suppose that we choose a LARGE bagof‐words representation of our images and that we use a binary histogramthat we use a binary histogram.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 117

Page 118: Pc Seminar Jordi

Min HASH

Given two different images, we canGiven two different images, we cancompute their histogram intersection:

Visual Object Recognition                 Perceptual Computing Seminar                        Page 118

Page 119: Pc Seminar Jordi

Min HASH

…and their histogram union:…and their histogram union:

Visual Object Recognition                 Perceptual Computing Seminar                        Page 119

Page 120: Pc Seminar Jordi

Min HASH

Then we can define a set similarityThen we can define a set similaritymeasure in the following way:

That is, the number of times both images have a givenkeypoint in common divided by the total number ofkeypoint in common divided by the total number ofkeypoints that are present in both images.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 120

Page 121: Pc Seminar Jordi

Min HASH

Visual Object Recognition                 Perceptual Computing Seminar                        Page 121

Page 122: Pc Seminar Jordi

Min HASHWe can perform clustering or matchingf d d f i h hof an unordered set of images with this

measure, but this can be used only witha limited amount of data!

The method requires 

w

id2

similarity evaluations, where w is the size of the vocabulary and di is th b f i i d t

i

i1

the number of regions assigned to the i‐th visual word. Vocabulary commonly used is w=1 000 000

Visual Object Recognition                 Perceptual Computing Seminar                        Page 122

w=1.000.000. 

Page 123: Pc Seminar Jordi

Min HASH

From can perform clustering orFrom can perform clustering ormatching of an unordered set of imageswith this measure but this can be usedwith this measure, but this can be usedonly with a limited amount of data!

Observation: histograms for angimage are highly sparse!

Visual Object Recognition                 Perceptual Computing Seminar                        Page 123

Page 124: Pc Seminar Jordi

Min HASH

The key idea of min‐hash is to mapThe key idea of min hash is to map(“hash”) each row/histogram to a smallamount of data Sig(A) (the signature)amount of data Sig(A) (the signature)such that:

• Sig(A) is small enough.• Rows A1 and A2 are highly similar ifSig(A1) is highly similar to Sig(A2).g 1 g y g 2

Visual Object Recognition                 Perceptual Computing Seminar                        Page 124

Page 125: Pc Seminar Jordi

Min HASH

Useful convention: we will refer to columns asbeing of four types:

A1: 1 0 1 01A2: 1 1 0 0Type: a b c dyp

We will also use “a” as the number of columns of type a. yp

Notes:  • Sim (A1 A2)=a/(a+b+c)Sim (A1 , A2)=a/(a+b+c)• Most columns are type d.  

Visual Object Recognition                 Perceptual Computing Seminar                        Page 125

Page 126: Pc Seminar Jordi

Min HASH• Imagine the columns permuted randomly indorder.

• Hash each row A to h(A), the number of thefi l i hi h hfirst column in which row A has a 1.

h(A ) 21 0 0 1 0

1 0 0 0 0

0 1 0 0 1

0 1 0 0 0

π h(A1)=2

h(A2)=2

The probability that h(A1) = h(A2) is1 2a/(a+b+c) = Sim (A1 , A2) (the hash agree if thefirst column with a 1 is a and disagree if it is of type b or c).

Visual Object Recognition                 Perceptual Computing Seminar                        Page 126

Page 127: Pc Seminar Jordi

Min HASHIf we repeat the experiment with a new

f l l b fpermutation of columns a large number oftimes, say 512, we get a signatureconsisting of 512 column numbers for eachrow.row.

The “similarity” of these lists (fraction ofpositions in which they agree) will be veryclose to the similarity of the rows (=close to the similarity of the rows (similar signatures mean similar rows!).

Visual Object Recognition                 Perceptual Computing Seminar                        Page 127

Page 128: Pc Seminar Jordi

Min HASHIn fact, it is not necessary to permute the columns: wecan hash each original column with 512 different hashcan hash each original column with 512 different hashfunctions and keep for each row the lowest hash value ofa row in which that column has a 1, independently foreach of the 512 hash functions. Then we look for thecoincidences.

1 0 0 1 0rowsignature

5 1 3 2 4

1 2 5 3 4

3 4 1 5 2

h1

h2h

h1(row)=  2

h2(row)=  1

h (row)= 33 4 1 5 2

2 5 4 1 3

h3h4

h3(row)=  3

h4(row)=  1

Visual Object Recognition                 Perceptual Computing Seminar                        Page 128

Page 129: Pc Seminar Jordi

Min HASH

1 0 1 1 0

0 1 0 0 1

1 1 0 1 0

Row 1

Row 2

R 3 1 1 0 1 0

1 2 3 4 5

5 4 3 2 1

h1

h

h1(row)=  1 ,  2 , 1

h2(row)= 2 1 2

Row 3

3 4 5 1 2h2h3

h2(row)   2 ,  1 , 2    

h3(row)=  1 ,  2 , 1

Similarities:

Row‐Row Sig‐SigRow Row Sig Sig1‐2:   0/5 0/31‐3:  2/4 3/32‐3:  1/4   0/3

Visual Object Recognition                 Perceptual Computing Seminar                        Page 129

/ /

Page 130: Pc Seminar Jordi

Min Hash

For efficient retrieval, the min hashes aregrouped into n‐tuples. In this example, we canform the following 2‐tuples:

h1(row)=  1 ,  2 , 1h (row)= 2 1 2h2(row)=  2 ,  1 , 2    h3(row)=  1 ,  2 , 1h4(row)= 3 , 2 , 3

The retrieval procedure then estimates the full

h4(row)   3 ,  2 , 3

similarity for only those image pairs that have atleast h identical tuples out of k tuples.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 130

Page 131: Pc Seminar Jordi

Min Hash

From 100k imagesFrom 100k images....

Visual Object Recognition                 Perceptual Computing Seminar                        Page 131

Page 132: Pc Seminar Jordi

Min Hash

From 100k imagesFrom 100k images....

Visual Object Recognition                 Perceptual Computing Seminar                        Page 132

Page 133: Pc Seminar Jordi

Min Hash

From 100k imagesFrom 100k images....

Representatives of the largest clusters

Visual Object Recognition                 Perceptual Computing Seminar                        Page 133

Page 134: Pc Seminar Jordi

Min Hash

Automatic localization of different buildings

Visual Object Recognition                 Perceptual Computing Seminar                        Page 134