What is the data?
Used data from Yahoo Finance to get daily data for Opening & Closing Price, Highest & Lowest Prices, Volume of the S&P 500 index.
How was the dataset compiled?
Code: Github
Used the yfinance library (github) to import data from yahoo finance directly. Some processing of data was done.
Quality of data
All but a few open prices were missing between 1962-01-01 and 1982-04-10. For these, it was assumed that open price is equal to closing price of previous trading day.
Volume figures until 1949-12-13 are not available.
Some earlier years have less than expected calendar dates
Year with less than expected trading days |
Number of Trading Days Recorded |
1927 |
1 |
1928 |
195 |
1929 |
199 |
1930 |
155 |
1931 |
183 |
1932 |
169 |
1933 |
136 |
1934 |
91 |
1935 |
83 |
1936 |
107 |
1937 |
83 |
1938 |
57 |
1939 |
27 |
1940 |
8 |
1941 |
6 |
1942 |
16 |
1943 |
7 |
1944 |
6 |
1945 |
42 |
1946 |
48 |
1947 |
18 |
1948 |
16 |
1949 |
1 |
1968 |
226 |
Added columns for:
1. percentage Gain/Loss (calculated by taking the percentage difference between closing prices of 2 consecutive trading days)
2. price variation percentage: (High-Low)/Closing