VFYW Project: 2012

Tuesday, February 21, 2012

Cluster Symmetric Patterns

The goal is to find iconic patterns from aerial imagery by clustering.
To simplify the problem, I start from clustering patterns of facade parts of aerial imagery.

Directly clustering densely sampled patches from aerial imagery will give edge and corner patterns rather than give iconic patches.

The result arouses a basic question: "What is iconic pattern?" If edge and corner are the iconic patterns then clustering densely sampled patches yield a good result. However, what we want is pattern like windows, roof tops, etc, which is not just corner and edge.

Therefore, I decide to find symmetric patterns which are usually the characteristics of some semantics tags, such as windows, and roof tops. The symmetric patterns will then go through k-means clustering for iconic patches.

Here is the aerial image I use:

Work on RGB or Lab space for symmetric detection give a poor signal.

We turn the raw image to sift image in SIFT flow paper:

http://people.csail.mit.edu/celiu/ECCV2008/

Then do the simple symmetry detection, just SSD of a patch and its flip version:

Black strips, which mean high symmetry parts, show the vertical axis of building.

Thresholding the SSD map gives the symmetric patches.

Use highly symmetric patches for k-means clustering. Here k is 10.

The distance metric is the Euclidean distance in the low dimension embedding that is random projection of RGB space.

Here are 10 cluster centers:

The average patch in a cluster:

The patch that is closet to cluster center:

The clusters distribution on original image:

The different color pixels stand for the center locations of patches for different clusters. Seems still not satisfactory. The ideal clustering result for me is to label the same cluster to the windows on the same facade.

Sunday, January 29, 2012

Dense Patch Matching With Part Model

The query and test image are roughly aligned with scale.
The part model is a line.
Scaling and rotation are put strong penalty.

Here is the Result!
The red dots note each part it finds.
The part is roughly at the right place.

I try to emphasize matching cost more to test the robustness. The result drifts away.

Below is the patches I used for template matching and the cross correlation map and top 20 local maximum points.

Thursday, January 26, 2012

Aerial View to Ground Plane View Image Matching

Here we used the code provided by Kyoung Mu here.
Two examples has been down.
One is small building at residential area. (size: 163 by 113)
Another is high building at downtown. (size: 445 by 300)

The instinct is, the image from small building is too blurred to have good feature description.
So higher resolution building could be easier to match.

We test the cases that two images are with and without normalized to the same size.

Both cases fail. And we test the downtown case.

These two cases success in finding correspondence. Though the repetitive patterns cause some confusion, but it still matches with the same building.

Friday, January 6, 2012

VFYW Benchmark Spec

Spec:
Test case:
The VFYW Dataset is now in JSON format.
VFYW Dataset{
“count”: # of query image,
“ans_precise”: # of query image with precise answer from players,
“ans_city”: # of query image with answer up to city level from players,
“ans_fail”: # of query image without correct answer from players,
“query_image”:[
{“id”: # of VFYW contest,
“city”: ground truth city,
“lat” : ground truth latitude,
“lng”: ground truth longitude,
“ans”: (1: precise; 2: city; 3: fail)
“guess_count”: # of negative guess from players
“guess”:[
    {
    “city”: guess city,
    “lat” : guess latitude,,
    “lng”: guess longitude,,
    },
    { …..
    },
  ]
}

Training data:
The training data to describe a city will be collected from Panoromio and/or Flickr and/or Goolge satellite image with 25km by 25km area that center at (lat, lng) saved in VFYW Dataset.

Base line algorithm:
KNN approach which is described similar as IM2GPS to produce a baseline classification result.

Argument:
The result of IM2GPS reported that accuracy of 1NN approach within 25km is about 15%. What if we get some candidate cities from human guess and then treat geolocalization problem as classification problem? If the accuracy boosts then we can argument that with the help from human, the geolocalization problem become much easier.

What's the different between VFYW benchmark and traditional place recognition/scene understanding?
Place recognition treat each training image independently. If a training image is not a near-duplicate one of query image, it will be useless. In our case, we need to ensemble information from all training images to produce a description for a region to match the query image.
Scene understanding classifies similar scene, such as urban, indoor, natural, for their training data in advance. A city could have all the scenes mentioned above. Scene classification can only describe part of a city.