Predict accident risk score for unique postcode
Dataset Description
According to IBEF “Domestic automobiles, production increased at 2.36% CAGR between FY16-20 with 26.36 million vehicles being manufactured in the country in FY20. Overall, domestic automobiles sales increased at 1.29% CAGR between FY16-FY20 with 21.55 million vehicles being sold in FY20”.The rise in vehicles on the road will also lead to multiple challenges and the road will be more vulnerable to accidents. Increased accident rates also lead to more insurance claims and payouts rise for insurance companies.
In order to pre-emptively plan for the losses, the insurance firms leverage accident data to understand the risk across the geographical units e.g. Postal code/district etc.
In this challenge, we are providing you with the dataset to predict the “Accident_Risk_Index” against the postcodes.
Accident_Risk_Index (mean casualties at a postcode) = sum(Number_of_casualities)/count(Accident_ID)
Pro-tip: The participants are required to perform feature engineering to the first roll up the train data at postcode level and create a column as “accident_risk_index” and optimize the model against postcode level.
Few Hypothesis to help you think:
"More accidents happen in the later part of the day as those are office hours causing congestion"
"Postal codes with more single carriage roads have more accidents"
(***In the above hypothesis features such as office_hours_flag and #single _carriage roads can be formed)
Additionally, we are providing you with road network data (contains info on the nearest road to a postcode and its characteristics) and population data (contains info about the population at the area level). This info is for augmentation of features, but not mandatory to use.
The provided dataset contains the following files:
Train: 4,84,042 rows x 27 columns
Test: 1,15,958 rows x 27 columns
train.csv & test.csv:
'Accident_ID',
'Police_Force',
'Number_of_Vehicles',
'Number_of_Casualties',
'Date',
'Day_of_Week',
'Time',
‘Local_Authority_(District)',
'Local_Authority_(Highway)',
'1st_Road_Class',
'1st_Road_Number',
'Road_Type',
'Speed_limit',
'2nd_Road_Class',
'2nd_Road_Number',
'Pedestrian_Crossing-Human_Control',
'Pedestrian_Crossing-Physical_Facilities',
'Light_Conditions',
‘'Weather_Conditions',
'Road_Surface_Conditions',
'Special_Conditions_at_Site',
'Carriageway_Hazards',
'Urban_or_Rural_Area',
'Did_Police_Officer_Attend_Scene_of_Accident',
'state',
'postcode',
'country'
Population: 8,035 rows x 10 columns
population.csv:
'postcode',
'Rural Urban',
'Variable: All usual residents; measures: Value',
'Variable: Males; measures: Value',
'Variable: Females; measures: Value',
‘Variable: Lives in a household; measures: Value',
‘Variable: Lives in a communal establishment; measures: Value',
'Variable: Schoolchild or full-time student aged 4 and over at their non term-time address; measures: Value',
'Variable: Area (Hectares); measures: Value',
'Variable: Density (number of persons per hectare); measures: Value'
Road Network: 91,566 rows x 8 columns
roads_network.csv:
'WKT',
'roadClassi',
‘roadFuncti',
'formOfWay',
'length',
'primaryRou',
'distance to the nearest point on rd',
'postcode’
Acknowledgements
The license for this dataset is the Open Government Licence used by all data on data.gov.uk here
data downloaded from machine hack
Related Datasets
-
Car Accident Dataset
@kaggle
-
Road Safety
@ukgov
-
Road Traffic Accidents
@ukgov
-
Registo Criminal Online
@ptgov