Baselight
Sign In
zenodo

Modern China Geospatial Database - Main Dataset

Verified Source
EU Open Research Repository

@zenodo.oai_zenodo_org_16758528

Loading...
Loading...

Zenodo

Dataset Description

MCGD_Data_V2.2 contains all the data that we have collected on locations in modern China, plus a number of locations outside of China that we encounter frequently in historical sources on China. All further updates will appear under the name "MCGD_Data" with a time stamp (e.g., MCGD_Data2023-06-21) You can also have access to this dataset and all the datasets that the ENP-China makes available on GitLab: https://gitlab.com/enpchina/IndexesEnp Altogether there are 464,970 entries. The data include seven variables:- Name:  Place names and their variants in Chinese, pinyin, and any recorded transliteration- Prov_Zh: Chinese province names in Chinese characters (新疆, 江蘇, 河北, etc.)- Prov_Py: Chinese province names in pinyin- LAT: Latitude coordinates- LONG: Longitude coordinates- LocID: Location identifiers- NameID: Location name identifiers The Name IDs all start with H followed by seven digits. This is the internal ID system of MCGD. Locations IDs that start with "D" are data points extracted from China Historical GIS (Harvard University); those that start with "E" are locations extracted from the data points in Geonames or data points we have added from various map sources. One of the main features of the MCGD Main Dataset is the systematic collection and compilation of place names from non-Chinese language historical sources. Locations were designated in transliteration systems that are hardly comprehensible today, which makes it very difficult to find the actual locations they correspond to. This dataset allows for the conversion from these obsolete transliterations to the current names and geocoordinates. From June 2021 onward, we have adopted a different file naming system to keep track of versions. From MCGD_Data_V1 we have moved to MCGD_Data_V2. In June 2022, we introduced time stamps, which result in the following naming convention: MCGD_Data_YYYY.MM.DD.    UPDATES MCGD_Data2025_08_06 introduces a significant update with the addition of the ‘Code’ column. This column categorizes place names as follows: A: Canonical Chinese name C: Alternative Chinese name P: Romanized name in pinyin W: Romanized name in another transliteration system When the codes P or W are doubled (PP, WW), this indicates that the place name does not match any existing Chinese name in the dataset. These unmatched names will be reviewed and linked progressively, rather than through a systematic batch process, due to their high volume.The coding system is designed to facilitate name-matching operations between MCGD and place names extracted from historical sources using programming tools. It also enables filtering for more precise and efficient matching. The dataset contains a total of 472,749 entries. MCGD_Data2025_02_28 includes a major change with the duplication of all the locations listed under Beijing, Shanghai, Tianjin, and Chongqing (北京, 上海, 天津, 重慶) and their listing under the name of the provinces to which they belonge origially before the creation of the four special municipalities after 1949. This is meant to facilitate the matching of data from historical sources. Each location has a unique NameID. Altogether there are 472,818 entries MCGD_Data2025_02_27 inclues an update on locations extracted from  Minguo zhengfu ge yuanhui keyuan yishang zhiyuanlu 國民政府各院部會科員以上職員錄 (Directory of staff members and above in the ministries and committees of the National Government). Nanjing: Guomin zhengfu wenguanchu yinzhuju 國民政府文官處印鑄局國民政府文官處印鑄局, 1944). We also made corrections in the Prov_Py and Prov_Zh columns as there were some misalignments between the pinyin name and the name in Chines characters. The file now includes 465,128 entries. MCGD_Data2024_03_23 includes an update on locations in Taiwan from the Asia Directories. Altogether there are 465,603 entries (of which 187 place names without geocoordinates, labelled in the Lat Long columns as "Unknown"). MCGD_Data2023.12.22 contains all the data that we have collected on locations in China, whatever the period. Altogether there are 465,603 entries (of which 187 place names without geocoordinates, labelled in the Lat Long columns as "Unknown"). The dataset also includes locations outside of China for the purpose of matching such locations to the place names extracted from historical sources. For example, one may need to locate individuals born outside of China. Rather than maintaining two separate files, we made the decision to incorporate all the place names found in historical sources in the gazetteer. Such place names can easily be removed by selecting all the entries where the 'Province' data is missing.
Publisher name: Zenodo
Last updated: 2026-02-20T14:44:06Z


Related Datasets

Share link

Anyone who has the link will be able to view this.