The 4K Global Geolocation Benchmark is a curated dataset of 4,000+ street-view images collected from diverse locations around the world, designed to evaluate the performance of AI models in predicting geographic coordinates from visual input alone.
๐ Key Features:
๐ Global Coverage: Images sampled across 6 continents (excluding Antarctica)
๐ท Street-Level Perspective: Ideal for visual geolocation tasks using VLMs like CLIP, BLIP-2, LLaVA, and GeoCLIP
๐ Embedded Coordinates: Latitude and longitude are encoded in the filenames for easy parsing
๐งช Benchmark-Ready: Widely used to evaluate models like GPT-4o, Claude 4, and other multimodal geolocation systems
This dataset has been used in various projects and academic benchmarks to test zero-shot, few-shot, and prompt-based geolocation reasoning. It's ideal for:
Vision-language geolocation research
Haversine error evaluation and distance scoring
GeoGuessr-style model training and inference
๐ก Use alongside language models or embeddings to predict location from scene content such as architecture, vegetation, road signs, and climate.