Borsa Italiana Listino 2023-07-11, Upd 2024-02-23
information from the borsaitaliana.it website to support a publication
@kaggle.robertolofaro_borsa_italiana_listino_as_of_20221119
information from the borsaitaliana.it website to support a publication
@kaggle.robertolofaro_borsa_italiana_listino_as_of_20221119
This dataset contains information in preparation of forthcoming publication:
General description: see linkedin post
Rationale of dataset and the associated project: Reading pre- and post-COVID corporate narratives, the Italian case: a dataset in fieri
See associated notebook (more charts will be added as further information willl be integrated)
The first file contained in this dataset is the list of stocks and warrants presented on the website of Borsa Italian as of 2023-07-11, specifically the following structure:
column | name | datatype | description |
---|---|---|---|
1 | # | numeric | position index |
2 | stock | text | name of the company, as per Borsa Italiana website |
3 | link | URL | URL link to the page |
4 | market | text | subsection of the "listino", as per Borsa Italiana website |
5 | ISIN | text | stock identification code, starting with a 2-char country code, followed by 10 digits |
6 | profile | URL | URL link to the profile page for the stock (if filled by the company) |
7 | detailspresent | char | Y=if a page with details was linked, N=details page not present |
8 | withinstudy | string | only for ISINs starting with IT where there was a value within the profile URL: blank if retained within the study, "MissingReports" if financial reports are partial or not available, "NotCoveringPeriod" if some financial reports 2019-2021 are missing |
9 | covidstudy | string | within those selected in column 8, further restricted, based on data available, companies for a study comparing pre- and post-Covid financial and operational information; values: Y = within the study / N = excluded due to data / outofscope = not within the scope |
10 | industry | string | na = not available: if a value is present = as listed by industry on BorsaItaliana.it |
11 | subindustry | string | na = not available: if a value is present = as listed by subindustry within the industry on BorsaItaliana.it |
12 | 2019accounts | string | languages of the 2019 accounts for companies whose "covidstudy" (column 9) is "Y"; if both English and Italian are available, EN is listed |
13 | 2021accounts | string | languages of the 2021 accounts for companies whose "covidstudy" (column 9) is "Y"; if both English and Italian are available, EN is listed |
14 | UsedforENG | string | string: Y if used for the text-based part of the study, i.e. those that have EN in both "2019accounts" and "2021accounts" |
15 | YahooFinanceURL | URL | using the ISIN as main point of reference, the link to YahooFinance page presenting financials; where non was available, "na" |
16 | checkvs2021yahoo | string | included=data reconciliation successful and company included in sample; bankassfin=company excluded but included in future study on bank/assurance/finance; excluded=company excluded for other reasons |
Note:
This dataset will be complemented with other information by end 2024
Whenever new data items will be added:
Added the following columns:
column | name | datatype | description |
---|---|---|---|
5 | ISIN | text | stock identification code, starting with a 2-char country code, followed by 10 digits |
6 | profile | URL | URL link to the profile page for the stock (if filled by the company) |
7 | detailspresent | char | Y=if a page with details was linked, N=details page not present |
Added the following column:
column | name | datatype | description |
---|---|---|---|
8 | withinstudy | string | only for ISINs starting with IT where there was a value within the profile URL: blank if retained within the study, "MissingReports" if financial reports are partial or not available, "NotCoveringPeriod" if some financial reports 2019-2021 are missing |
Added the following column:
column | name | datatype | description |
---|---|---|---|
9 | covidstudy | string | within those selected in column 8, further restricted, based on data available, companies for a study comparing pre- and post-Covid financial and operational information; values: Y = within the study / N = excluded due to data / outofscope = not within the scope |
Added the following columns:
| 10 | industry | string | na = not available: if a value is present = as listed by industry on BorsaItaliana.it |
| 11 | subindustry | string | na = not available: if a value is present = as listed by subindustry within the industry on BorsaItaliana.it |
Added the following columns:
column | name | datatype | description |
---|---|---|---|
12 | 2019accounts | string | languages of the 2019 accounts for companies whose "covidstudy" (column 9) is "Y"; if both English and Italian are available, EN is listed |
13 | 2021accounts | string | languages of the 2021 accounts for companies whose "covidstudy" (column 9) is "Y"; if both English and Italian are available, EN is listed |
14 | UsedforENG | string | string: Y if used for the text-based part of the study, i.e. those that have EN in both "2019accounts" and "2021accounts" |
Added the following column:
column | name | datatype | description |
---|---|---|---|
15 | YahooFinanceURL | URL | using the ISIN as main point of reference, the link to YahooFinance page presenting financials; where non was available, "na" |
Added the following column:
| 16 | checkvs2021yahoo | string | included=data reconciliation successful and company included in sample; bankassfin=company excluded but included in future study on bank/assurance/finance; excluded=company excluded for other reasons |
After manually extracting a sample for each layer (listing, sample pages, sample profiles, sample details), to identify structure:
All the accesses to the website (except single-page sample testing) were done:
Language used: Python within Jupyter Notebook
Libraries (in alphabetical order):
Anyone who has the link will be able to view this.