Baselight

Borsa Italiana Listino 2023-07-11, Upd 2024-02-23

information from the borsaitaliana.it website to support a publication

@kaggle.robertolofaro_borsa_italiana_listino_as_of_20221119

About this Dataset

Borsa Italiana Listino 2023-07-11, Upd 2024-02-23

This dataset contains information in preparation of forthcoming publication:

  • extracted from public open data accessible via web (see "Production Notes" section at the end for details)
  • overall aim: comparing company data pre- and post-COVID, i.e. evolution from 2019 to 2022 (balance sheet due July 2023)

General description: see linkedin post

Rationale of dataset and the associated project: Reading pre- and post-COVID corporate narratives, the Italian case: a dataset in fieri

See associated notebook (more charts will be added as further information willl be integrated)

Structure of the file: listino_catalog_kaggle.csv

The first file contained in this dataset is the list of stocks and warrants presented on the website of Borsa Italian as of 2023-07-11, specifically the following structure:

column name datatype description
1 # numeric position index
2 stock text name of the company, as per Borsa Italiana website
3 link URL URL link to the page
4 market text subsection of the "listino", as per Borsa Italiana website
5 ISIN text stock identification code, starting with a 2-char country code, followed by 10 digits
6 profile URL URL link to the profile page for the stock (if filled by the company)
7 detailspresent char Y=if a page with details was linked, N=details page not present
8 withinstudy string only for ISINs starting with IT where there was a value within the profile URL: blank if retained within the study, "MissingReports" if financial reports are partial or not available, "NotCoveringPeriod" if some financial reports 2019-2021 are missing
9 covidstudy string within those selected in column 8, further restricted, based on data available, companies for a study comparing pre- and post-Covid financial and operational information; values: Y = within the study / N = excluded due to data / outofscope = not within the scope
10 industry string na = not available: if a value is present = as listed by industry on BorsaItaliana.it
11 subindustry string na = not available: if a value is present = as listed by subindustry within the industry on BorsaItaliana.it
12 2019accounts string languages of the 2019 accounts for companies whose "covidstudy" (column 9) is "Y"; if both English and Italian are available, EN is listed
13 2021accounts string languages of the 2021 accounts for companies whose "covidstudy" (column 9) is "Y"; if both English and Italian are available, EN is listed
14 UsedforENG string string: Y if used for the text-based part of the study, i.e. those that have EN in both "2019accounts" and "2021accounts"
15 YahooFinanceURL URL using the ISIN as main point of reference, the link to YahooFinance page presenting financials; where non was available, "na"
16 checkvs2021yahoo string included=data reconciliation successful and company included in sample; bankassfin=company excluded but included in future study on bank/assurance/finance; excluded=company excluded for other reasons

Note:

  • this table is kept as a CSV source, which was build on 2023-07-12 using the information extracted on 2023-07-11 from the Borsa Italiana website (specifically, the "listino A-Z" 30 pages available)
  • only the latest version of this dataset is always visible
  • it has been updated on 2023-08-04, adding column 8 ("withinstudy") after retrieving the financial reports for all the companies on Borsa Milano that fulfill the condition described in the table able for column 8
  • it has been updated on 2023-09-03, adding column 9 ("covidstudy") after identifying which companies are part of the study (i.e. beside the other conditions, annual reports for 2019 and 2021 are available)
  • it has been updated on 2023-11-02, adding column 10 ("industry") and 11 ("subindustry") as on the BorsaItaliana.it website (the lists by industry and subindustry posted online as of 2023-11-02 covered only part of the "azioni" listed on the website; if needed, can provide those lists and the whole taxonomy)
  • it has been updated on 2023-12-22, adding columns 12 ("2019accounts") and 13 ("2021accounts") and 14 ("UsedforENG"), as the next publication step is to share information comparing 2019 and 2021 as within the financial reports collected; if a company had a fiscal year partially overlapping with the solar year, the "2019" and "2021" could be "2018/2019" and "2021/2022", to avoid including part of 2020, as in that year accounts where partially unpublished
  • it has been updated on 2024-02-02, searching YahooFinance by ISIN and then selecting the Financials page; where not available this way, or ambiguous, the search has been done contextually and by looking at the data to identify the one relevant to the dataset; where even this search did not provide unambiguous results or provided no results, it was marked "na"

This dataset will be complemented with other information by end 2024

Whenever new data items will be added:

  • if new columns, will be in this dataset, and used in the sample notebook
  • if new files, will be documented here, and a table of contents added

Update History

UPDATE 2023-07-23

Added the following columns:

column name datatype description
5 ISIN text stock identification code, starting with a 2-char country code, followed by 10 digits
6 profile URL URL link to the profile page for the stock (if filled by the company)
7 detailspresent char Y=if a page with details was linked, N=details page not present

UPDATE 2023-08-04

Added the following column:

column name datatype description
8 withinstudy string only for ISINs starting with IT where there was a value within the profile URL: blank if retained within the study, "MissingReports" if financial reports are partial or not available, "NotCoveringPeriod" if some financial reports 2019-2021 are missing
  • it has been updated on 2023-08-04, adding column 8 ("withinstudy") after retrieving the financial reports for all the companies on Borsa Milano that fulfill the condition described in the table able for column 8

UPDATE 2023-09-03

Added the following column:

column name datatype description
9 covidstudy string within those selected in column 8, further restricted, based on data available, companies for a study comparing pre- and post-Covid financial and operational information; values: Y = within the study / N = excluded due to data / outofscope = not within the scope
  • it has been updated on 2023-09-03, adding column 9 ("covidstudy") after identifying which companies are part of the study (i.e. beside the other conditions, annual reports for 2019 and 2021 are available)

UPDATE 2023-11-02

Added the following columns:

| 10 | industry | string | na = not available: if a value is present = as listed by industry on BorsaItaliana.it |
| 11 | subindustry | string | na = not available: if a value is present = as listed by subindustry within the industry on BorsaItaliana.it |

  • it has been updated on 2023-11-02, adding column 10 ("industry") and 11 ("subindustry") as on the BorsaItaliana.it website (the lists by industry and subindustry posted online as of 2023-11-02 covered only part of the "azioni" listed on the website; if needed, can provide those lists and the whole taxonomy)

UPDATE 2023-12-22

Added the following columns:

column name datatype description
12 2019accounts string languages of the 2019 accounts for companies whose "covidstudy" (column 9) is "Y"; if both English and Italian are available, EN is listed
13 2021accounts string languages of the 2021 accounts for companies whose "covidstudy" (column 9) is "Y"; if both English and Italian are available, EN is listed
14 UsedforENG string string: Y if used for the text-based part of the study, i.e. those that have EN in both "2019accounts" and "2021accounts"
  • it has been updated on 2023-12-22, adding columns 12 ("2019accounts") and 13 ("2021accounts") and 14 ("UsedforENG"), as the next publication step is to share information comparing 2019 and 2021 as within the financial reports collected; if a company had a fiscal year partially overlapping with the solar year, the "2019" and "2021" could be "2018/2019" and "2021/2022", to avoid including part of 2020, as in that year accounts where partially unpublished

UPDATE 2024-02-06

Added the following column:

column name datatype description
15 YahooFinanceURL URL using the ISIN as main point of reference, the link to YahooFinance page presenting financials; where non was available, "na"
  • it has been updated on 2024-02-02, searching YahooFinance by ISIN and then selecting the Financials page; where not available this way, or ambiguous, the search has been done contextually and by looking at the data to identify the one relevant to the dataset; where even this search did not provide unambiguous results or provided no results, it was marked "na"

UPDATE 2024-02-23

Added the following column:
| 16 | checkvs2021yahoo | string | included=data reconciliation successful and company included in sample; bankassfin=company excluded but included in future study on bank/assurance/finance; excluded=company excluded for other reasons |

  • following the previous udpate on 2024-02-06, reconciled 2021 annual reports with the information available on YahooFinance, to identify reporting practices and further focus the selection

Production Notes

After manually extracting a sample for each layer (listing, sample pages, sample profiles, sample details), to identify structure:

  1. read from the "listino" (search feature) to extract automatically the list of stocks
  2. used the information retrieved to identify the presence of a profile page and other keys
  3. used the keys found to extract automatically further information from the Borsa Italiana website

All the accesses to the website (except single-page sample testing) were done:

  • after standard market hours
  • adding a delay of few seconds between each read

Language used: Python within Jupyter Notebook

Libraries (in alphabetical order):

  • glob
  • json
  • os
  • pandas
  • pathlib
  • requests
  • time

Share link

Anyone who has the link will be able to view this.