Social Media Ban For Minors: A Computational Analysis Of Media Coverage In Europe And Beyond, Dataset
@ecjrc.n_290dca45_6753_489e_af1f_b2c0381ac617
@ecjrc.n_290dca45_6753_489e_af1f_b2c0381ac617
This dataset comprises more than 10,000 news articles referring to social media bans for minors, published between 1 January 2024 and 15 March 2025. It served as the primary input for the publication “Social Media Ban for Minors: A Computational Analysis of Media Coverage in Europe and Beyond.” The data are organised by month.
Due to data sensitivity considerations, we are unable to provide a dataset containing a separate list of unverified online sources. For this reason, all sources, both mainstream and unverified, are included together within one dataset. The identification of unverified sources is based on assessments conducted by the European External Action Service (EEAS), as outlined in the “3rd EEAS Report on Foreign Information Manipulation and Interference Threats”, as well as evaluations made by independent external experts working in the field of disinformation. Some of these sources are reviewed by reputable fact-checking organisations such as mediascan, butac, factcheck, cbsnews, and konspiratori.
For each article, the following fields are provided: Title, Link, Publication Date, Source, Guid, Cluster Title, Cluster Keyphrases, Cluster Summary, Framing Dimensions, and Persuasion Techniques. By tracking coverage over time and applying multilingual clustering combined with large language model (LLM)–based cluster summarisation, analysts identified the key narratives surrounding debates on social media bans. The “cluster” field is derived from a multilingual clustering pipeline using the LaBSE sentence embedding model, PyNNDescent for approximate neighbourhood graphs, and LeidenAlg for community detection. Each cluster represents a story or narrative prominent in a given month. For each cluster, 100 random article excerpts (first 350 characters) were sampled, and GPT-4, GPT-4-turbo, and GPT-3.5-turbo were used to generate a cluster title, cluster keyphrases, and a cluster summary.
The “framing dimensions” and “persuasion techniques” fields contain specific frames and rhetorical strategies identified within each article. Articles may contain multiple instances. These labels were produced using in-house machine-learning classifiers. Framing refers to the perspective under which an issue or a piece of news is presented. We consider 14 frames: (1) Economic, (2) Capacity and resources, (3) Morality, (4) Fairness and equality, (5) Legality, constitutionality and jurisprudence, (6) Policy prescription and evaluation, (7) Crime and punishment, (8) Security and defence, (9) Health and safety, (10) Quality of life, (11) Cultural identity, (12) Public opinion, (13) Political, (14) External regulation and reputation . Persuasion techniques refer to the style of writing of a text with the aim to influence the reader. In this report we consider the following sub selection: (1) Appeal to Authority, (2) Appeal to Fear-Prejudice, (3) Appeal to Hypocrisy, (4) Appeal to Time, (5) Appeal to Values, (6) Causal Oversimplification, (7) Consequential Oversimplification, (8) Conversation Killer, (9) Doubt, (10) Exaggeration-Minimisation, (11) False Dilemma-No Choice, (12) Flag Waving, (13) Guilt by Association, (14) Loaded Language, (15) Name Calling-Labelling, (16) Questioning the Reputation, (17) Repetition, (18) Slogan.
Publisher name: Joint Research Centre
Publisher URL: https://commission.europa.eu/about/departments-and-executive-agencies/joint-research-centre
Last updated: 2026-02-20T00:06:33Z
Share link
Anyone who has the link will be able to view this.