Dataset Description:
The dataset encompasses flight pricing information for the year 2019, focusing on five major Indian metropolitan cities. The data is thoughtfully curated to assist in the development and refinement of linear regression models, with an emphasis on simplicity and ease of use. The dataset is presented in a well-organized format, featuring the following essential fields:
Airline: The name of the airline operating the flight.
Date_of_Journey: The date on which the flight is scheduled to depart.
Source: The city from which the flight originates.
Dep_Time: The departure time of the flight.
Destination: The city to which the flight is destined.
Price: The corresponding price of the flight.
In order to facilitate seamless model application, the data has undergone preprocessing, resulting in the creation of additional fields:
Airline_encoded: Numeric encoding for the airline, improving model compatibility.
Source_encoded: Numeric encoding for the source city.
Destination_encoded: Numeric encoding for the destination city.
Date: Extracted day from the 'Date_of_Journey.'
Month: Extracted month from the 'Date_of_Journey.'
Year: Extracted year from the 'Date_of_Journey.'
Hour: Extracted hour from the 'Dep_Time.'
Minutes: Extracted minutes from the 'Dep_Time.'
The numeric encodings are designed for enhanced model interpretability and compatibility, providing a standardized representation of categorical variables. The dataset is meticulously cleaned, ensuring that it is well-suited for learning purposes. It serves as an excellent resource for individuals seeking to develop and refine linear regression models for predicting flight prices based on various features.
Encoded Values:
'Bangalore': 0,
'Chennai': 1,
'Delhi': 2,
'Kolkata': 3,
'Mumbai': 4
'Air Asia': 0,
'Air India': 1,
'GoAir': 2,
'IndiGo': 3,
'Jet Airways': 4,
'Jet Airways Business': 5,
'Multiple carriers': 6,
'Multiple carriers Premium economy': 7,
'SpiceJet': 8,
'Trujet': 9,
'Vistara': 10,
'Vistara Premium economy': 11