Wednesday, October 21, 2009

VLFeat - now also with Python interfaces

VLFeat is an open source library that implements some popular computer vision algorithms including SIFT, MSER, k-means, hierarchical k-means and more. It is written in C (by Andrea Vedaldi and Brian Fulkerson) and comes with interfaces in Matlab. The license is GNU GPL v2.

If you are using local descriptors like SIFT or MSER or if you need fast clustering code you should definitely try out VLFeat. The easiest is to download the version with both source and binaries from here. There is a Git repository with the most current development code.

The good news is that there is now a fairly complete Python wrapper, open sourced and developed by my Polar Rose colleague Mikael Rousson. The wrapper is currently in a Git fork, get it here. Try it out! (You need Boost:Python installed).

Thursday, October 8, 2009

Is your NumPy using the right ATLAS?

Here's a tip I got from my colleague Martin. Making sure that your NumPy installation uses the right back-end can drastically improve performance. ATLAS is the software library used by NumPy (and Matlab, Octave etc.) to do linear algebra.

Try running the following 1000*1000 matrix multiplication test:

>>> from numpy import *
>>> import time
>>> A = random.random((1000,1000))
>>> B = random.random((1000,1000))
>>> t = time.time(); dot(A,B); print time.time()-t

Disappointed about the time this takes? Take a minute to check that you are running the right ATLAS for your computer. On my Mac this test was fast (0.2-0.3s) but on my desktop running Ubuntu very slow (5-6s). In Ubuntu, open the Package Manager and search for "atlas". If you find something like "libatlas3gf-base" you are using a generic version that works on most hardware. Unless your computer is antique, you can probably do better. Try selecting "libatlas3gf-sse2" (or even libatlas3gf-sse1 if this doesn't work) instead and run the test again. (on other linux platforms the process should be similar)

Tests I did on a couple of machines show speedup of a factor 4-10. Well worth the minute it takes to check and fix. Thanks Martin!

Wednesday, September 16, 2009

Exporting from NumPy to Weka

Weka is a very useful tool for experimenting with different classifiers, pre-processing of data and much more. Weka is an open source (GPL) collection of machine learning algorithms for data mining written in Java. Not only algorithms, Weka also has a GUI for exploring data, running classification experiments and visualization without having to write any code. Some screen shots:


Loading and exploring some data (in this case the iris sample data set).


Running cross-correlation with different classifiers.

Visualization.

If you are used to working in NumPy, the following function will write NumPy arrays to Weka .arff data files that can be loaded directly from the GUI.

def write_to_weka(filename,relationname,attributenames,attributes,comment=''):
""" writes NumPy arrays with data to WEKA format .arff files

input: relationname (string with a description), attributenames (list
of the names of different attributes), attributes (array of attributes,
one row for each attribute, WEKA treats last row as classlabels by
default), comment (short description of the content)."""

nbrattributes = len(attributenames)
if attributes.shape[1] != nbrattributes:
raise Exception('Number of attribute names is not equal to length of attributes')

f = open(filename, 'w')
f.write('% '+comment+'\n')
f.write('\n')
f.write('@RELATION '+relationname+'\n')

for a in attributenames:
f.write('@ATTRIBUTE '+str(a)+' NUMERIC\n') #assume values are numeric

f.write('\n')
f.write('@DATA\n') #write the data, one attribute vector per line
for i in range(attributes.shape[0]):
for j in range(nbrattributes):
f.write(str(attributes[i,j]))
if j < nbrattributes-1:
f.write(', ')
f.write('\n')
f.close()


Weka is one of those tools I always forget I have, I hope this post will help me remember and hopefully introduce Weka to new people.

Tip: If you run out of memory, start Weka with "java -Xmx512m weka.jar weka.gui.GUIChooser" and you will have 512MB... or whatever you choose.

Monday, August 31, 2009

"Cover Flow" with PyGame and OpenGL

The other day I was curious at PyOpenGL and PyGame and decided to try to implement a crude "cover flow"-like image browser in Python. PyGame is a set of Python modules designed for writing games. PyOpenGL is a free Python binding to OpenGL. The NEHE OpenGL demos (python version available at here) and the examples found at this page contains lots of useful examples to learn from.

The result is here. Very crude and unpolished but actually something that works. The demo loads all .jpg images in the current folder and shows them with "periodic boundary conditions". (there are some limitations: it doesn't work with too big images but you could just add a check and resize, images are assumed to have 2:3 aspect ratio, all images are loaded in memory + many more I'm sure)

Anyway, it was a fun exercise.

Thursday, August 20, 2009

Report from CIVR 2009

I'm slowly getting into posting after a stint of traveling and vacation. As a warmup, I'll give a short report (way overdue!) on CIVR 2009 where I gave a talk early July. The conference was held in Thira on the island of Santorini, Greece.

The conference twitter hashtag gives a nice record of what people were tweeting about at the time. The conference program is here.

One of the highlights was Luis von Ahn's keynote "Human Computation". The keynote was more or less the same as his talk at Library of Congress mixed with some from his Google tech talk at the end. Both presentations are well worth seeing if you are not familiar with Luis' work.

Some interesting papers:
* Jasper Uijlings, Arnold Smeulders and Remko Scha, Real-time Bag of Words, Approximately (Speedups for bag of word pipeline, fast computation of histograms using matrix multiplication. Use RBF kernel, random forest etc, giver higher accuracy and much faster.) Awarded best paper of CIVR 2009.
* Rainer Lienhart, Stefan Romberg and Eva Hörster, Multilayer pLSA for Multimodal Image Retrieval (Combine visual words and tags a la Flickr)
* Ville Viitaniemi and Jorma Laaksonen. Spatial Extensions to Bag of Visual Words ("soft" assignment)
* Matthijs Douze, Hervé Jégou, Harsimrat Singh, Laurent Amsaleg and Cordelia Schmid. Evaluation of GIST descriptors for web-scale image search (use of GIST as more compact representation to scale to 100M images)
In general there is a clear trend towards using Flickr tags combined with SIFT descriptors.

My presentation at the "practitioner day" was about Polar Rose and lessons learned when deploying large scale computer vision for the web. Slides are here.

A nice conference, smaller than the ICCV/ECCV/CVPRs I try to go to but focused and full of interesting presentation. The next conference will be in Xi'an, China, July 5-7, 2010.


Picture of the speakers at the practitioner day outside the conference building with the volcano in the back.

Wednesday, July 15, 2009

Robust homography estimation using RANSAC

I'm just back from lovely Santorini and the ACM CIVR conference. State-of-the-art image retrieval seems to use a bag of feature representation (a topic itself worthy of a few future posts) with re-ranking of the top results based on geometric consistency of local image features. This consistency check is done using homography estimation. As I mentioned in my previous post about RANSAC, this is a useful algorithm for among other things homography estimation. Here's a simple way of using the Python RANSAC implementation to do exactly this.

We can use ransac.py for any model. All that is needed is a class with fit() and get_error() methods.

To fit a homography using RANSAC we first need the following model class (you can add it to the file homography.py from another post which already contains some useful things).

class ransac_model:
""" class for testing homography fit with ransac.py from
http://www.scipy.org/Cookbook/RANSAC"""

def __init__(self,debug=False):
self.debug = debug

def fit(self, data):
""" fit homography to four selected correspondences"""

#transpose to fit H_from_points()
data = data.T

#from points
fp = data[:3,:4]
#target points
tp = data[3:,:4]

#fit homography, H
H = H_from_points(fp,tp)

return H

def get_error( self, data, H):
""" apply homography to all correspondences,
return error for each transformed point"""

data = data.T

#from points
fp = data[:3]
#target points
tp = data[3:]

#transform fp
fp_transformed = dot(H,fp)

#normalize hom. coordinates
for i in range(3):
fp_transformed[i] /= fp_transformed[2]

err_per_point = sqrt( sum((tp-fp_transformed)**2,axis=0) )

return err_per_point


def H_from_points(fp,tp):
""" find homography H, such that fp is mapped to tp
using the linear DLT method. Points are conditioned
automatically."""

if fp.shape != tp.shape:
raise RuntimeError, 'number of points do not match'


#condition points (important for numerical reasons)
#--from points--
m = mean(fp[:2], axis=1)
maxstd = max(std(fp[:2], axis=1))
C1 = diag([1/maxstd, 1/maxstd, 1])
C1[0][2] = -m[0]/maxstd
C1[1][2] = -m[1]/maxstd
fp = dot(C1,fp)

#--to points--
m = mean(tp[:2], axis=1)
#C2 = C1.copy() #must use same scaling for both point sets
maxstd = max(std(tp[:2], axis=1))
C2 = diag([1/maxstd, 1/maxstd, 1])
C2[0][2] = -m[0]/maxstd
C2[1][2] = -m[1]/maxstd
tp = dot(C2,tp)


#create matrix for linear method, 2 rows for each correspondence pair
nbr_correspondences = fp.shape[1]
A = zeros((2*nbr_correspondences,9))
for i in range(nbr_correspondences):
A[2*i] = [-fp[0][i],-fp[1][i],-1,0,0,0,tp[0][i]*fp[0][i],tp[0][i]*fp[1][i],tp[0][i]]
A[2*i+1] = [0,0,0,-fp[0][i],-fp[1][i],-1,tp[1][i]*fp[0][i],tp[1][i]*fp[1][i],tp[1][i]]

U,S,V = linalg.svd(A)

H = V[8].reshape((3,3))

#decondition
H = dot(linalg.inv(C2),dot(H,C1))

#normalize and return
return H / H[2][2]

As you can see, this class contains a fit() method which just takes the four correspondences selected by ransac.py (they are the first four in data) and fits a homography. Remember, four points are the minimal number to compute a homography. The method get_error() applies the homography and returns the distance for each correspondence pair so that RANSAC can chose which points to keep as inliers and outliers. This is done with a threshold on this distance.

We also need the function H_from_points() to estimate the homography in each RANSAC iteration. Here we use the DLT algorithm.

For ease of use, add the following function to homography.py.

def H_from_ransac(fp,tp,model,maxiter=1000,match_theshold=10):
""" robust estimation of homography H from point
correspondences using RANSAC (ransac.py from
http://www.scipy.org/Cookbook/RANSAC).

input: fp,tp (3*n arrays) points in hom. coordinates"""

import ransac

#use ransac class
model = ransac_model()
#group corresponding points
data = vstack((fp,tp))

H = ransac.ransac(data.T,model,4,maxiter,match_theshold,10)

return H

The function also lets you supply the threshold and the minimum number of points desired. The most important parameter is the maximum number of iterations, exiting too early might give a worse solution, too many iterations will take more time.

Try this on some sample data to see that it works. Using a set with 20 inliers and 10 outliers you can get a figure like this:



Pretty neat!

Monday, July 6, 2009

CVPR 2009 Word Cloud Summary

I seem to have missed an exciting CVPR this year. For those of you who also missed out, I have created a word cloud (using Wordle) from the full program page of CVPR 2009. Here it is:


Wordle: CVPR 2009 Program


I removed "University", "Time" and "Location" for obvious reasons. Give Wordle a try, it is a nice tool.