Baselight

Machine Hack Mar22

Predict accident risk score for unique postcode

@kaggle.krishnadaskv_machine_hack_mar22_predict_accident_risk_score

Loading...
Loading...

About this Dataset

Machine Hack Mar22

According to IBEF “Domestic automobiles, production increased at 2.36% CAGR between FY16-20 with 26.36 million vehicles being manufactured in the country in FY20. Overall, domestic automobiles sales increased at 1.29% CAGR between FY16-FY20 with 21.55 million vehicles being sold in FY20”.The rise in vehicles on the road will also lead to multiple challenges and the road will be more vulnerable to accidents. Increased accident rates also lead to more insurance claims and payouts rise for insurance companies.

In order to pre-emptively plan for the losses, the insurance firms leverage accident data to understand the risk across the geographical units e.g. Postal code/district etc.

In this challenge, we are providing you with the dataset to predict the “Accident_Risk_Index” against the postcodes.
Accident_Risk_Index (mean casualties at a postcode) = sum(Number_of_casualities)/count(Accident_ID)

Pro-tip: The participants are required to perform feature engineering to the first roll up the train data at postcode level and create a column as “accident_risk_index” and optimize the model against postcode level.

Few Hypothesis to help you think:
"More accidents happen in the later part of the day as those are office hours causing congestion"

"Postal codes with more single carriage roads have more accidents"

(***In the above hypothesis features such as office_hours_flag and #single _carriage roads can be formed)

Additionally, we are providing you with road network data (contains info on the nearest road to a postcode and its characteristics) and population data (contains info about the population at the area level). This info is for augmentation of features, but not mandatory to use.

The provided dataset contains the following files:

Train: 4,84,042 rows x 27 columns

Test: 1,15,958 rows x 27 columns

train.csv & test.csv:

'Accident_ID',
'Police_Force',
'Number_of_Vehicles',
'Number_of_Casualties',
'Date',
'Day_of_Week',
'Time',
‘Local_Authority_(District)',
'Local_Authority_(Highway)',
'1st_Road_Class',
'1st_Road_Number',
'Road_Type',
'Speed_limit',
'2nd_Road_Class',
'2nd_Road_Number',
'Pedestrian_Crossing-Human_Control',
'Pedestrian_Crossing-Physical_Facilities',
'Light_Conditions',
‘'Weather_Conditions',
'Road_Surface_Conditions',
'Special_Conditions_at_Site',
'Carriageway_Hazards',
'Urban_or_Rural_Area',
'Did_Police_Officer_Attend_Scene_of_Accident',
'state',
'postcode',
'country'

Population: 8,035 rows x 10 columns

population.csv:

​​'postcode',
'Rural Urban',
'Variable: All usual residents; measures: Value',
'Variable: Males; measures: Value',
'Variable: Females; measures: Value',
‘Variable: Lives in a household; measures: Value',
‘Variable: Lives in a communal establishment; measures: Value',
'Variable: Schoolchild or full-time student aged 4 and over at their non term-time address; measures: Value',
'Variable: Area (Hectares); measures: Value',
'Variable: Density (number of persons per hectare); measures: Value'

Road Network: 91,566 rows x 8 columns

roads_network.csv:

'WKT',
'roadClassi',
‘roadFuncti',
'formOfWay',
'length',
'primaryRou',
'distance to the nearest point on rd',
'postcode’

Acknowledgements

The license for this dataset is the Open Government Licence used by all data on data.gov.uk here
data downloaded from machine hack

Tables

Population

@kaggle.krishnadaskv_machine_hack_mar22_predict_accident_risk_score.population
  • 281.35 KB
  • 8035 rows
  • 10 columns
Loading...

CREATE TABLE population (
  "postcode" VARCHAR,
  "rural_urban" VARCHAR,
  "variable_all_usual_residents_measures_value" BIGINT,
  "variable_males_measures_value" BIGINT,
  "variable_females_measures_value" BIGINT,
  "variable_lives_in_a_household_measures_value" BIGINT,
  "variable_lives_in_a_communal_establishment_measures_value" BIGINT,
  "variable_schoolchild_or_full_time_student_aged_4_and_o_e068bb60" BIGINT,
  "variable_area_hectares_measures_value" DOUBLE,
  "variable_density_number_of_persons_per_hectare_measures_value" DOUBLE
);

Roads Network

@kaggle.krishnadaskv_machine_hack_mar22_predict_accident_risk_score.roads_network
  • 2.58 MB
  • 91566 rows
  • 8 columns
Loading...

CREATE TABLE roads_network (
  "wkt" VARCHAR,
  "roadclassi" VARCHAR,
  "roadfuncti" VARCHAR,
  "formofway" VARCHAR,
  "length" DOUBLE,
  "primaryrou" DOUBLE,
  "distance_to_the_nearest_point_on_rd" DOUBLE,
  "postcode" VARCHAR
);

Sample Submission

@kaggle.krishnadaskv_machine_hack_mar22_predict_accident_risk_score.sample_submission
  • 355.84 KB
  • 49772 rows
  • 2 columns
Loading...

CREATE TABLE sample_submission (
  "postcode" VARCHAR,
  "accident_risk_index" BIGINT
);

Test

@kaggle.krishnadaskv_machine_hack_mar22_predict_accident_risk_score.test
  • 2.61 MB
  • 121259 rows
  • 27 columns
Loading...

CREATE TABLE test (
  "accident_id" BIGINT,
  "police_force" BIGINT,
  "number_of_vehicles" BIGINT,
  "number_of_casualties" BIGINT,
  "date" VARCHAR,
  "day_of_week" BIGINT,
  "time" VARCHAR,
  "local_authority_district" BIGINT,
  "local_authority_highway" VARCHAR,
  "n_1st_road_class" BIGINT,
  "n_1st_road_number" BIGINT,
  "road_type" VARCHAR,
  "speed_limit" BIGINT,
  "n_2nd_road_class" BIGINT,
  "n_2nd_road_number" BIGINT,
  "pedestrian_crossing_human_control" VARCHAR,
  "pedestrian_crossing_physical_facilities" VARCHAR,
  "light_conditions" VARCHAR,
  "weather_conditions" VARCHAR,
  "road_surface_conditions" VARCHAR,
  "special_conditions_at_site" VARCHAR,
  "carriageway_hazards" VARCHAR,
  "urban_or_rural_area" BIGINT,
  "did_police_officer_attend_scene_of_accident" VARCHAR,
  "state" VARCHAR,
  "postcode" VARCHAR,
  "country" VARCHAR
);

Train

@kaggle.krishnadaskv_machine_hack_mar22_predict_accident_risk_score.train
  • 10.88 MB
  • 478741 rows
  • 27 columns
Loading...

CREATE TABLE train (
  "accident_id" BIGINT,
  "police_force" BIGINT,
  "number_of_vehicles" BIGINT,
  "number_of_casualties" BIGINT,
  "date" VARCHAR,
  "day_of_week" BIGINT,
  "time" VARCHAR,
  "local_authority_district" BIGINT,
  "local_authority_highway" VARCHAR,
  "n_1st_road_class" BIGINT,
  "n_1st_road_number" BIGINT,
  "road_type" VARCHAR,
  "speed_limit" BIGINT,
  "n_2nd_road_class" BIGINT,
  "n_2nd_road_number" BIGINT,
  "pedestrian_crossing_human_control" VARCHAR,
  "pedestrian_crossing_physical_facilities" VARCHAR,
  "light_conditions" VARCHAR,
  "weather_conditions" VARCHAR,
  "road_surface_conditions" VARCHAR,
  "special_conditions_at_site" VARCHAR,
  "carriageway_hazards" VARCHAR,
  "urban_or_rural_area" BIGINT,
  "did_police_officer_attend_scene_of_accident" VARCHAR,
  "state" VARCHAR,
  "postcode" VARCHAR,
  "country" VARCHAR
);

Share link

Anyone who has the link will be able to view this.