Baselight

2018 Airplane Flights

Predicting prices of airline flights!

@kaggle.zernach_2018_airplane_flights

Loading...
Loading...

About this Dataset

2018 Airplane Flights

As I write this, I'm a Data Science student. To add to my portfolio, I wanted to build a web app to predict the prices of airline flight prices: the user would be able to select an origin and a destination. I found a database from The Bureau of Transportation Statistics. View the ReadMe file in my Github repo for this project. Check out all the info on my portfolio's webpage for this project.

Dataset Description & Story

I followed the following steps to get 2018 data. You can follow the same steps to get data from 2019, 2020, 2021, etc.

Visit > https://www.transtats.bts.gov/
Click > Aviation
Click > Airline Origin and Destination Survey (DB1B)
Click > DB1BMarket

I downloaded their data from Q1, Q2, Q3, and Q4 from 2018 — a total of 27M+ rows and 42 columns.

For my pricing prediction purposes, I eliminated unnecessary columns, renamed some columns, and refined it for consistency — bringing it to a new total of 9M+ rows and 13 columns. My cleaning process is shown in the .ipynb file in my Github repo for this project.

Have fun and share your kernels, please!

Column Descriptions

1. Unnamed: drop this column (it's a duplicate index column)

2-3. ItinID & MktID: vaguely demonstrates the order in which tickets were ordered (lower ID #'s being ordered first)

4. MktCoupons: the number of coupons in the market for that flight

5. Quarter: 1, 2, 3, or 4, all of which are in 2018

6. Origin: the city out of which the flight begins

7. OriginWac: USA State/Territory World Area Code

8. Dest: the city out of which the flight begins

9. DestWac: USA State/Territory World Area Code

10. Miles: the number of miles traveled

11. ContiguousUSA: binary column -- (2) meaning flight is in the contiguous (48) USA states, and (1) meaning it is not (ie: Hawaii, Alaska, off-shore territories)

12. NumTicketsOrdered: number of tickets that were purchased by the user

13. Airline Company: the two-letter airline company code that the user used from start to finish (key codes below)

14. PricePerTicket: target prediction column

Airline Company Codes (in order of frequency for this dataset)

WN -- Southwest Airlines Co.

DL -- Delta Air Lines Inc.

AA -- American Airlines Inc.

UA -- United Air Lines Inc.

B6 -- JetBlue Airways

AS -- Alaska Airlines Inc.

NK -- Spirit Air Lines

G4 -- Allegiant Air

F9 -- Frontier Airlines Inc.

HA -- Hawaiian Airlines Inc.

SY -- Sun Country Airlines d/b/a MN Airlines

VX -- Virgin America

USA State/Territory World Area Codes

1 Alaska

2 Hawaii

3 Puerto Rico

4 U.S. Virgin Islands

5 U.S. Pacific Trust Territories and Possessions

11 Connecticut

12 Maine

13 Massachusetts

14 New Hampshire

15 Rhode Island

16 Vermont

21 New Jersey

22 New York

23 Pennsylvania

31 Delaware

32 District of Columbia

33 Florida

34 Georgia

35 Maryland

36 North Carolina

37 South Carolina

38 Virginia

39 West Virginia

41 Illinois

42 Indiana

43 Michigan

44 Ohio

45 Wisconsin

51 Alabama

52 Kentucky

53 Mississippi

54 Tennessee

61 Iowa

62 Kansas

63 Minnesota

64 Missouri

65 Nebraska

66 North Dakota

67 South Dakota

71 Arkansas

72 Louisiana

73 Oklahoma

74 Texas

81 Arizona

82 Colorado

83 Idaho

84 Montana

85 Nevada

86 New Mexico

87 Utah

88 Wyoming

91 California

92 Oregon

93 Washington

Tables

Cleaned 2018 Flights

@kaggle.zernach_2018_airplane_flights.cleaned_2018_flights
  • 148.3 MB
  • 9534417 rows
  • 14 columns
Loading...

CREATE TABLE cleaned_2018_flights (
  "unnamed_0" BIGINT,
  "itinid" BIGINT,
  "mktid" BIGINT,
  "mktcoupons" BIGINT,
  "quarter" BIGINT,
  "origin" VARCHAR,
  "originwac" BIGINT,
  "dest" VARCHAR,
  "destwac" BIGINT,
  "miles" DOUBLE,
  "contiguoususa" BIGINT,
  "numticketsordered" DOUBLE,
  "airlinecompany" VARCHAR,
  "priceperticket" DOUBLE
);

Share link

Anyone who has the link will be able to view this.