Suicidality on Reddit
Characterization of Time-variant and Time-invariant Assessment of Suicidality
By [source]
About this dataset
Do you want to unlock the potential for targeted suicide intervention? This dataset includes 500 anonymized posts from Reddit users and labels them with a Columbia Suicide Severity Rating Scale (C-SSRS) specially designed to assess their suicidality, behavior, and underlying mental health issues. By understanding the severity of these issues better, interventions can be made that provide help and support in a sustained manner. So let this dataset enable you to identify and refer those at highest risk for self-harm or suicide in order to hopefully save lives – your data analysis will never have been so meaningful!
More Datasets
For more datasets, click here.
Featured Notebooks
- 🚨 Your notebook can be here! 🚨!
How to use the dataset
This dataset provides an in-depth overview of the suicidality of 500 anonymous Reddit users. By using the Columbia Suicide Severity Rating Scale (C-SSRS), the severity and risk for each individual can be assessed more accurately to better identify potential interventions for suicidal users. This dataset contains 3 columns: “User”, “Post”, and “Label”.
The two columns labeled: User, Post contain user data that is intended to remain anonymous and confidential. The column labeled User contains a string of characters representing an anonymized user ID while the column labeled Post contains text from their corresponding posts without any personal information revealed other than what was typed in by them during their post.
The Label column describes suicide risk levels as assigned by a modified version of the Columbia Suicide Severity Rating Scale (C-SSRS), which ranges from 0-4 with 4 being at highest risk for suicide behavior and 0 being no suicidal ideations or plans were observed in their post On a different note, if the entry is 'NaN' then no label was assigned as there were not enough relevant signs exhibited by that particular user's post to grant it one of these labels
Research Ideas
- Analyzing temporal trends of depression and suicidality in an online community: By pairing the timestamps of user posts with the relevant C-SSRS scores, researchers can observe the frequency and severity of depression and suicidal behavior over time across the entire dataset, or split into subsets based on different demographic categories.
- Identifying keywords related to high risk suicidal behavior: Using natural language processing techniques, researchers could use this dataset to identify which words correlate with high-risk suicide in text responses from anonymous users. This information could then be used to provide more targeted interventions for at-risk populations
- Predicting suicidal behavior likelihood among anonymized users: With machine learning algorithms applied to this dataset, risk factors associated with greater potential for suicide completions or attempts can be identified and predicted by observed characteristics such as past post topics or user ages/genders. This type of predictive modeling is invaluable for providing proactive suicide prevention services in online communities such as Reddit
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: 500_anonymized_Reddit_users_posts_labels - 500_anonymized_Reddit_users_posts_labels.csv
Column name |
Description |
User |
Unique identifier for each user. (String) |
Post |
Text of the post. (String) |
Label |
C-SSRS label assigned to the post. (Integer) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .