Friday, January 6, 2012

VFYW Benchmark Spec


Spec:
Test case:
The VFYW Dataset is now in JSON format.
VFYW Dataset{
“count”: # of query image,
“ans_precise”: # of query image with precise answer from players,
“ans_city”: # of query image with answer up to city level from players,
“ans_fail”: # of query image without correct answer from players,
“query_image”:[
 {“id”: # of VFYW contest,
 “city”: ground truth city,
 “lat” : ground truth latitude,
 “lng”: ground truth longitude,
 “ans”: (1: precise; 2: city; 3: fail)
 “guess_count”: # of negative guess from players
 “guess”:[
    {
    “city”: guess city,
    “lat” : guess latitude,,
    “lng”: guess longitude,,
    },
    { …..
    },
  ]
}

Training data:
The training data to describe a city will be collected from Panoromio and/or Flickr and/or Goolge satellite image with 25km by 25km area that center at (lat, lng) saved in VFYW Dataset.



Base line algorithm:
KNN approach which is described similar as IM2GPS to produce a baseline classification result.


Argument:
The result of IM2GPS reported that accuracy of 1NN approach within 25km is about 15%. What if we get some candidate cities from human guess and then treat geolocalization problem as classification problem? If the accuracy boosts then we can argument that with the help from human, the geolocalization problem become much easier.


What's the different between VFYW benchmark and traditional place recognition/scene understanding?
Place recognition treat each training image independently. If a training image is not a near-duplicate one of query image, it will be useless. In our case, we need to ensemble information from all training images to produce a description for a region to match the query image.
Scene understanding classifies similar scene, such as urban, indoor, natural, for their training data in advance. A city could have all the scenes mentioned above. Scene classification can only describe part of a city.

No comments:

Post a Comment