Baselight

Support2

9105 individual critically ill patients across 5 United States medical centers

@kaggle.joebeachcapital_support2

Loading...
Loading...

About this Dataset

Support2

This dataset comprises 9105 individual critically ill patients across 5 United States medical centers, accessioned throughout 1989-1991 and 1992-1994. Each row concerns hospitalized patient records who met the inclusion and exclusion criteria for nine disease categories: acute respiratory failure, chronic obstructive pulmonary disease, congestive heart failure, liver disease, coma, colon cancer, lung cancer, multiple organ system failure with malignancy, and multiple organ system failure with sepsis. The goal is to determine these patients' 2- and 6-month survival rates based on several physiologic, demographics, and disease severity information. It is an important problem because it addresses the growing national concern over patients' loss of control near the end of life. It enables earlier decisions and planning to reduce the frequency of a mechanical, painful, and prolonged dying process.

For what purpose was the dataset created?

To develop and validate a prognostic model that estimates survival over a 180-day period for seriously ill hospitalized adults (phase I of SUPPORT) and to compare this model's predictions with those of an existing prognostic system and with physicians' independent estimates (SUPPORT phase II).

Who funded the creation of the dataset?

Funded by the Robert Wood Johnson Foundation

What do the instances in this dataset represent?

The instances represent records of critically ill patients admitted to United States hospitals with advanced stages of serious illness.

Are there recommended data splits?

No recommendation, standard train-test split could be used. Can use three-way holdout split (i.e., train-validation-test) when doing model selection.

Does the dataset contain data that might be considered sensitive in any way?

Yes. There is information about race, gender, income, and education level.

Was there any data preprocessing performed?

No. Due to the high percentage of missing values, there are a couple of recommended imputation values:
According to the HBiostat Repository (https://hbiostat.org/data/repo/supportdesc, Professor Frank Harrell) the following default values have been found to be useful in imputing missing baseline physiologic data:
Baseline Variable Normal Fill-in Value

  • Serum albumin (alb) 3.5
  • PaO2/FiO2 ratio (pafi) 333.3
  • Bilirubin (bili) 1.01
  • Creatinine (crea) 1.01
  • bun 6.51
  • White blood count (wblc) 9 (thousands)
  • Urine output (urine) 2502
    There are 159 patients surviving 2 months for whom there were no patient or surrogate interviews. These patients have missing sfdm2.

Additional Information

Data sources are medical records, personal interviews, and the National Death Index (NDI). For each patient administrative records data, clinical data and survey data were collected.
The objective of the SUPPORT project was to improve decision-making in order to address the growing national concern over the loss of control that patients have near the end of life and to reduce the frequency of a mechanical, painful, and prolonged process of dying. SUPPORT comprised a two-year prospective observational study (Phase I) followed by a two-year controlled clinical trial (Phase II). Phase I of SUPPORT collected data from patients accessioned during 1989-1991 to characterize the care, treatment preferences, and patterns of decision-making among critically ill patients. It also served as a preliminary step for devising an intervention strategy for improving critically-ill patients' care and for the construction of statistical models for predicting patient prognosis and functional status. An intervention was implemented in Phase II of SUPPORT, which accessioned patients during 1992-1994. The Phase II intervention provided physicians with accurate predictive information on future functional ability, survival probability to six months, and patients' preferences for end-of-life care. Additionally, a skilled nurse was provided as part of the intervention to elicit patient preferences, provide prognoses, enhance understanding, enable palliative care, and facilitate advance planning. The intervention was expected to increase communication, resulting in earlier decisions to have orders against resuscitation, decrease time that patients spent in undesirable states (e.g., in the Intensive Care Unit, on a ventilator, and in a coma), increase physician understanding of patients' preferences for care, decrease patient pain, and decrease hospital resource use. Data collection in both phases of SUPPORT consisted of questionnaires administered to patients, their surrogates, and physicians, plus chart reviews for abstracting clinical, treatment, and decision information. Phase II also collected information regarding the implementation of the intervention, such as patient-specific logs maintained by nurses assigned to patients as part of the intervention. SUPPORT patients were followed for six months after inclusion in the study. Those who did not die within six months or were lost to follow-up were matched against the National Death Index to identify deaths through 1997. Patients who did not die within one year or were lost to follow-up were matched against the National Death Index to identify deaths through 1997.
All patients in five United States medical centers who met inclusion and exclusion criteria for nine disease categories: acute respiratory failure, chronic obstructive pulmonary disease, congestive heart failure, liver disease, coma, colon cancer, lung cancer, multiple organ system failure with malignancy, and multiple organ system failure with sepsis. SUPPORT is a combination of patients from 2 studies, each of which lasted 2 years. The first phase concerns 4,301 patients, whereas the second phase concerns 4,804 patients. Time wise, these studies were accessioned in 1989 (June 12) through 1991 (June 11) for phase I and in 1992 (January 7) through 1994 (January 24).

Tables

Support2

@kaggle.joebeachcapital_support2.support2
  • 611.89 KB
  • 9105 rows
  • 47 columns
Loading...

CREATE TABLE support2 (
  "age" DOUBLE,
  "death" BIGINT,
  "sex" VARCHAR,
  "hospdead" BIGINT,
  "slos" BIGINT,
  "d_time" BIGINT,
  "dzgroup" VARCHAR,
  "dzclass" VARCHAR,
  "num_co" BIGINT,
  "edu" DOUBLE,
  "income" VARCHAR,
  "scoma" DOUBLE,
  "charges" DOUBLE,
  "totcst" DOUBLE,
  "totmcst" DOUBLE,
  "avtisst" DOUBLE,
  "race" VARCHAR,
  "sps" DOUBLE,
  "aps" DOUBLE,
  "surv2m" DOUBLE,
  "surv6m" DOUBLE,
  "hday" BIGINT,
  "diabetes" BIGINT,
  "dementia" BIGINT,
  "ca" VARCHAR,
  "prg2m" DOUBLE,
  "prg6m" DOUBLE,
  "dnr" VARCHAR,
  "dnrday" DOUBLE,
  "meanbp" DOUBLE,
  "wblc" DOUBLE,
  "hrt" DOUBLE,
  "resp" DOUBLE,
  "temp" DOUBLE,
  "pafi" DOUBLE,
  "alb" DOUBLE,
  "bili" DOUBLE,
  "crea" DOUBLE,
  "sod" DOUBLE,
  "ph" DOUBLE,
  "glucose" DOUBLE,
  "bun" DOUBLE,
  "urine" DOUBLE,
  "adlp" DOUBLE,
  "adls" DOUBLE,
  "sfdm2" VARCHAR,
  "adlsc" DOUBLE
);

Share link

Anyone who has the link will be able to view this.