VFYW Project

Tuesday, February 21, 2012

Cluster Symmetric Patterns

The goal is to find iconic patterns from aerial imagery by clustering.
To simplify the problem, I start from clustering patterns of facade parts of aerial imagery.

Directly clustering densely sampled patches from aerial imagery will give edge and corner patterns rather than give iconic patches.

The result arouses a basic question: "What is iconic pattern?" If edge and corner are the iconic patterns then clustering densely sampled patches yield a good result. However, what we want is pattern like windows, roof tops, etc, which is not just corner and edge.

Therefore, I decide to find symmetric patterns which are usually the characteristics of some semantics tags, such as windows, and roof tops. The symmetric patterns will then go through k-means clustering for iconic patches.

Here is the aerial image I use:

Work on RGB or Lab space for symmetric detection give a poor signal.

We turn the raw image to sift image in SIFT flow paper:

http://people.csail.mit.edu/celiu/ECCV2008/

Then do the simple symmetry detection, just SSD of a patch and its flip version:

Black strips, which mean high symmetry parts, show the vertical axis of building.

Thresholding the SSD map gives the symmetric patches.

Use highly symmetric patches for k-means clustering. Here k is 10.

The distance metric is the Euclidean distance in the low dimension embedding that is random projection of RGB space.

Here are 10 cluster centers:

The average patch in a cluster:

The patch that is closet to cluster center:

The clusters distribution on original image:

The different color pixels stand for the center locations of patches for different clusters. Seems still not satisfactory. The ideal clustering result for me is to label the same cluster to the windows on the same facade.

Sunday, January 29, 2012

Dense Patch Matching With Part Model

The query and test image are roughly aligned with scale.
The part model is a line.
Scaling and rotation are put strong penalty.

Here is the Result!
The red dots note each part it finds.
The part is roughly at the right place.

I try to emphasize matching cost more to test the robustness. The result drifts away.

Below is the patches I used for template matching and the cross correlation map and top 20 local maximum points.

Thursday, January 26, 2012

Aerial View to Ground Plane View Image Matching

Here we used the code provided by Kyoung Mu here.
Two examples has been down.
One is small building at residential area. (size: 163 by 113)
Another is high building at downtown. (size: 445 by 300)

The instinct is, the image from small building is too blurred to have good feature description.
So higher resolution building could be easier to match.

We test the cases that two images are with and without normalized to the same size.

Both cases fail. And we test the downtown case.

These two cases success in finding correspondence. Though the repetitive patterns cause some confusion, but it still matches with the same building.

Friday, January 6, 2012

VFYW Benchmark Spec

Spec:
Test case:
The VFYW Dataset is now in JSON format.
VFYW Dataset{
“count”: # of query image,
“ans_precise”: # of query image with precise answer from players,
“ans_city”: # of query image with answer up to city level from players,
“ans_fail”: # of query image without correct answer from players,
“query_image”:[
{“id”: # of VFYW contest,
“city”: ground truth city,
“lat” : ground truth latitude,
“lng”: ground truth longitude,
“ans”: (1: precise; 2: city; 3: fail)
“guess_count”: # of negative guess from players
“guess”:[
    {
    “city”: guess city,
    “lat” : guess latitude,,
    “lng”: guess longitude,,
    },
    { …..
    },
  ]
}

Training data:
The training data to describe a city will be collected from Panoromio and/or Flickr and/or Goolge satellite image with 25km by 25km area that center at (lat, lng) saved in VFYW Dataset.

Base line algorithm:
KNN approach which is described similar as IM2GPS to produce a baseline classification result.

Argument:
The result of IM2GPS reported that accuracy of 1NN approach within 25km is about 15%. What if we get some candidate cities from human guess and then treat geolocalization problem as classification problem? If the accuracy boosts then we can argument that with the help from human, the geolocalization problem become much easier.

What's the different between VFYW benchmark and traditional place recognition/scene understanding?
Place recognition treat each training image independently. If a training image is not a near-duplicate one of query image, it will be useless. In our case, we need to ensemble information from all training images to produce a description for a region to match the query image.
Scene understanding classifies similar scene, such as urban, indoor, natural, for their training data in advance. A city could have all the scenes mentioned above. Scene classification can only describe part of a city.

Wednesday, November 16, 2011

Game with A Purpose

Two games are designed to collect semantic tags.

First game is analogous to Peek-A-Boom game:

•Inverse-Problem game

•Boom:

–Give a image with city name (from panoramio probably)

–Ask to point some predefined attributes (tree, building,…etc.) for peek

•Peek

–Answer the location of the city with as least of number of attributes given as possible

•Effect:

–Collect location of most representative objects for the place

The second game is analogous to Herd-It game:

•Given one query image (with or w/o ground truth location) and multiple players

•Game:

–Ask which of k other images matches query image the best.

•Effect:

–Collect relational similar image of query image and output the possible location of query image at the same time

Social Mobilization + GWAP - VFYW breaker:

•Crowd sourcing the answer of VFYW game by playing the game (such as the second game)

•Collect useful tags at the same time

•The highest score player has the right for own the answer to VFYW contest

Thursday, October 27, 2011

Geometric Feature Pruning

Geometric Feature Pruning uses the semantic tags on maps to form a feature based on geometric relationship of tags to reduce search space of image localization problem.

Semantic tags:
Google map API allows us to extract semantic tag such as
road
man-made building...etc
Google Style Map Spec

Geometric Feature:
In Madrid example, the angle between road is used.

Experimental Setup:
The Madrid example is used to verify the idea.

The full search space is a rectangle around the ground truth location.

The search space is 425 m in width and height which is about 0.03 of size (m^2/m^2) of city.

Semantic map:

(a) road

(b) building and space

We have 9 maps with the same size of example image to cover the search space.

Geometric feature:

We develop an intersection descriptor which can be automatically extracted from styled Google Map.

Two features in intersection descriptor can be used for pruning search space.

1. the number of corners at the intersection.

2. the angle of the corner.

Experiment:

Query Intersection is computed from rectified images.

Here "Number of Corner: 2" and "Angle: 73.88" are used to prune the search space.

The ground truth location is at the center of image.

The degree of matching score can be represented by radius of blue circle.

The radius r is computed by

r = R * exp(-d(ang0,ang1)/sigma)

where R, sigma is a constant. d(. , .) is 2 norm distance of query angle and angle in database.

Another Example: Paris

The Paris Example

Query: "Number of Corner: 4" and "Angle: 57.68" are used to prune the search space.

Result:

The ground truth location is at the center of image.

Tuesday, October 18, 2011

Bibliography

Image-based localization:
From Structure-from-Motion Point Clouds to Fast Location Recognition CVPR'09
Location Recognition using Prioritized Feature Matching ECCV'10
Fast Image-Based Localization using Direct 2D-to-3D Matching ICCV'11

3D reconstruction:
Piecewise Planar and Non-Planar Stereo for Urban Scene Reconstruction
- multiview stereo with peice-wise constraint
- not talking about roof

Fusion of feature- and area-based information for urban buildings modeling from aerial imagery
- have all data (range data)
manhattan world stereo
- Piece-wise planer assumption everywhere

Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics
- claim only qualitative reconstruction is feasible and "impossible for metric reconstruction from a single image"
- aim at a fully automatic system

Closing the Loop in Scene Interpretation
-