VFYW Project: January 2012

Sunday, January 29, 2012

Dense Patch Matching With Part Model

The query and test image are roughly aligned with scale.
The part model is a line.
Scaling and rotation are put strong penalty.

Here is the Result!
The red dots note each part it finds.
The part is roughly at the right place.

I try to emphasize matching cost more to test the robustness. The result drifts away.

Below is the patches I used for template matching and the cross correlation map and top 20 local maximum points.

Thursday, January 26, 2012

Aerial View to Ground Plane View Image Matching

Here we used the code provided by Kyoung Mu here.
Two examples has been down.
One is small building at residential area. (size: 163 by 113)
Another is high building at downtown. (size: 445 by 300)

The instinct is, the image from small building is too blurred to have good feature description.
So higher resolution building could be easier to match.

We test the cases that two images are with and without normalized to the same size.

Both cases fail. And we test the downtown case.

These two cases success in finding correspondence. Though the repetitive patterns cause some confusion, but it still matches with the same building.

Friday, January 6, 2012

VFYW Benchmark Spec

Spec:
Test case:
The VFYW Dataset is now in JSON format.
VFYW Dataset{
“count”: # of query image,
“ans_precise”: # of query image with precise answer from players,
“ans_city”: # of query image with answer up to city level from players,
“ans_fail”: # of query image without correct answer from players,
“query_image”:[
{“id”: # of VFYW contest,
“city”: ground truth city,
“lat” : ground truth latitude,
“lng”: ground truth longitude,
“ans”: (1: precise; 2: city; 3: fail)
“guess_count”: # of negative guess from players
“guess”:[
    {
    “city”: guess city,
    “lat” : guess latitude,,
    “lng”: guess longitude,,
    },
    { …..
    },
  ]
}

Training data:
The training data to describe a city will be collected from Panoromio and/or Flickr and/or Goolge satellite image with 25km by 25km area that center at (lat, lng) saved in VFYW Dataset.

Base line algorithm:
KNN approach which is described similar as IM2GPS to produce a baseline classification result.

Argument:
The result of IM2GPS reported that accuracy of 1NN approach within 25km is about 15%. What if we get some candidate cities from human guess and then treat geolocalization problem as classification problem? If the accuracy boosts then we can argument that with the help from human, the geolocalization problem become much easier.

What's the different between VFYW benchmark and traditional place recognition/scene understanding?
Place recognition treat each training image independently. If a training image is not a near-duplicate one of query image, it will be useless. In our case, we need to ensemble information from all training images to produce a description for a region to match the query image.
Scene understanding classifies similar scene, such as urban, indoor, natural, for their training data in advance. A city could have all the scenes mentioned above. Scene classification can only describe part of a city.