Baselight

Duke Breast Cancer MRI (Pre, Post-1 And Segments)

Processed MRI sequences from DICOM to NIFTI for pre and post contrast sequences

@kaggle.madhava20217_duke_breast_cancer_mri_nifti_pre_and_post_1_only

Loading...
Loading...

About this Dataset

Duke Breast Cancer MRI (Pre, Post-1 And Segments)

This dataset is just a processed version from DICOM to NIFTI of the Breast Cancer MRI dataset by Duke University.

Updated 23 October 2023

  • Altered the saved format : now uses img.gz instead of .nii.gz
  • Fixed reorientation of DICOM to NIFTI (now in 1:1 correspondence with the originally supplied annotation boxes)
  • Segmentation masks are more in line with the tumours
  • Pyradiomics extraction amended

Preprocessing steps involved:

  1. Processed individual DICOM slices using SimpleITK. The resultant file format uses an img.gz version that is supported by Pyradiomics. The layout is [slice, height, width] in the numpy array obtained for each sequence. This also eliminated the need for altering the bounding boxes.
  2. Selected Pre and Post-Contrast (Post_1 sequence) for each patient.
  3. Used Otsu thresholding of post-1 sequences for automated segmentation of the 3D volume. The supplied lesion bounding boxes were used to only keep the segmentation within the bounding box.
  4. The post-1 sequence and the segmentation masks of the lesions to extract features from the MRI sequences using Pyradiomics. Mask checking was enabled in Pyradiomics.

Reference for the original dataset:
Saha, A., Harowicz, M.R., Grimm, L.J., Kim, C.E., Ghate, S.V., Walsh, R. and Mazurowski, M.A., 2018. A machine learning approach to radiogenomics of breast cancer: a study of 922 subjects and 529 DCE-MRI features. British journal of cancer, 119(4), pp.508-516.
A free version of this paper is available here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6134102/.

Tables

Patient Class Labels

@kaggle.madhava20217_duke_breast_cancer_mri_nifti_pre_and_post_1_only.patient_class_labels
  • 10.31 KB
  • 922 rows
  • 5 columns
Loading...

CREATE TABLE patient_class_labels (
  "patient_id" VARCHAR,
  "er" BIGINT,
  "pr" BIGINT,
  "her2" BIGINT,
  "mol_subtype" BIGINT
);

Pyradiomics Extraction

@kaggle.madhava20217_duke_breast_cancer_mri_nifti_pre_and_post_1_only.pyradiomics_extraction
  • 619.79 KB
  • 922 rows
  • 109 columns
Loading...

CREATE TABLE pyradiomics_extraction (
  "patient" VARCHAR,
  "sequence" VARCHAR,
  "original_shape_elongation" DOUBLE,
  "original_shape_flatness" DOUBLE,
  "original_shape_leastaxislength" DOUBLE,
  "original_shape_majoraxislength" DOUBLE,
  "original_shape_maximum2ddiametercolumn" DOUBLE,
  "original_shape_maximum2ddiameterrow" DOUBLE,
  "original_shape_maximum2ddiameterslice" DOUBLE,
  "original_shape_maximum3ddiameter" DOUBLE,
  "original_shape_meshvolume" DOUBLE,
  "original_shape_minoraxislength" DOUBLE,
  "original_shape_sphericity" DOUBLE,
  "original_shape_surfacearea" DOUBLE,
  "original_shape_surfacevolumeratio" DOUBLE,
  "original_shape_voxelvolume" DOUBLE,
  "original_firstorder_10percentile" DOUBLE,
  "original_firstorder_90percentile" DOUBLE,
  "original_firstorder_energy" DOUBLE,
  "original_firstorder_entropy" DOUBLE,
  "original_firstorder_interquartilerange" DOUBLE,
  "original_firstorder_kurtosis" DOUBLE,
  "original_firstorder_maximum" DOUBLE,
  "original_firstorder_meanabsolutedeviation" DOUBLE,
  "original_firstorder_mean" DOUBLE,
  "original_firstorder_median" DOUBLE,
  "original_firstorder_minimum" DOUBLE,
  "original_firstorder_range" DOUBLE,
  "original_firstorder_robustmeanabsolutedeviation" DOUBLE,
  "original_firstorder_rootmeansquared" DOUBLE,
  "original_firstorder_skewness" DOUBLE,
  "original_firstorder_totalenergy" DOUBLE,
  "original_firstorder_uniformity" DOUBLE,
  "original_firstorder_variance" DOUBLE,
  "original_glcm_autocorrelation" DOUBLE,
  "original_glcm_clusterprominence" DOUBLE,
  "original_glcm_clustershade" DOUBLE,
  "original_glcm_clustertendency" DOUBLE,
  "original_glcm_contrast" DOUBLE,
  "original_glcm_correlation" DOUBLE,
  "original_glcm_differenceaverage" DOUBLE,
  "original_glcm_differenceentropy" DOUBLE,
  "original_glcm_differencevariance" DOUBLE,
  "original_glcm_id" DOUBLE,
  "original_glcm_idm" DOUBLE,
  "original_glcm_idmn" DOUBLE,
  "original_glcm_idn" DOUBLE,
  "original_glcm_imc1" DOUBLE,
  "original_glcm_imc2" DOUBLE,
  "original_glcm_inversevariance" DOUBLE,
  "original_glcm_jointaverage" DOUBLE,
  "original_glcm_jointenergy" DOUBLE,
  "original_glcm_jointentropy" DOUBLE,
  "original_glcm_mcc" DOUBLE,
  "original_glcm_maximumprobability" DOUBLE,
  "original_glcm_sumaverage" DOUBLE,
  "original_glcm_sumentropy" DOUBLE,
  "original_glcm_sumsquares" DOUBLE,
  "original_gldm_dependenceentropy" DOUBLE,
  "original_gldm_dependencenonuniformity" DOUBLE,
  "original_gldm_dependencenonuniformitynormalized" DOUBLE,
  "original_gldm_dependencevariance" DOUBLE,
  "original_gldm_graylevelnonuniformity" DOUBLE,
  "original_gldm_graylevelvariance" DOUBLE,
  "original_gldm_highgraylevelemphasis" DOUBLE,
  "original_gldm_largedependenceemphasis" DOUBLE,
  "original_gldm_largedependencehighgraylevelemphasis" DOUBLE,
  "original_gldm_largedependencelowgraylevelemphasis" DOUBLE,
  "original_gldm_lowgraylevelemphasis" DOUBLE,
  "original_gldm_smalldependenceemphasis" DOUBLE,
  "original_gldm_smalldependencehighgraylevelemphasis" DOUBLE,
  "original_gldm_smalldependencelowgraylevelemphasis" DOUBLE,
  "original_glrlm_graylevelnonuniformity" DOUBLE,
  "original_glrlm_graylevelnonuniformitynormalized" DOUBLE,
  "original_glrlm_graylevelvariance" DOUBLE,
  "original_glrlm_highgraylevelrunemphasis" DOUBLE,
  "original_glrlm_longrunemphasis" DOUBLE,
  "original_glrlm_longrunhighgraylevelemphasis" DOUBLE,
  "original_glrlm_longrunlowgraylevelemphasis" DOUBLE,
  "original_glrlm_lowgraylevelrunemphasis" DOUBLE,
  "original_glrlm_runentropy" DOUBLE,
  "original_glrlm_runlengthnonuniformity" DOUBLE,
  "original_glrlm_runlengthnonuniformitynormalized" DOUBLE,
  "original_glrlm_runpercentage" DOUBLE,
  "original_glrlm_runvariance" DOUBLE,
  "original_glrlm_shortrunemphasis" DOUBLE,
  "original_glrlm_shortrunhighgraylevelemphasis" DOUBLE,
  "original_glrlm_shortrunlowgraylevelemphasis" DOUBLE,
  "original_glszm_graylevelnonuniformity" DOUBLE,
  "original_glszm_graylevelnonuniformitynormalized" DOUBLE,
  "original_glszm_graylevelvariance" DOUBLE,
  "original_glszm_highgraylevelzoneemphasis" DOUBLE,
  "original_glszm_largeareaemphasis" DOUBLE,
  "original_glszm_largeareahighgraylevelemphasis" DOUBLE,
  "original_glszm_largearealowgraylevelemphasis" DOUBLE,
  "original_glszm_lowgraylevelzoneemphasis" DOUBLE,
  "original_glszm_sizezonenonuniformity" DOUBLE,
  "original_glszm_sizezonenonuniformitynormalized" DOUBLE,
  "original_glszm_smallareaemphasis" DOUBLE,
  "original_glszm_smallareahighgraylevelemphasis" DOUBLE
);

Segmentation Annotations Nifti

@kaggle.madhava20217_duke_breast_cancer_mri_nifti_pre_and_post_1_only.segmentation_annotations_nifti
  • 22.89 KB
  • 922 rows
  • 7 columns
Loading...

CREATE TABLE segmentation_annotations_nifti (
  "patient_id" VARCHAR,
  "start_row" BIGINT,
  "end_row" BIGINT,
  "start_column" BIGINT,
  "end_column" BIGINT,
  "start_slice" BIGINT,
  "end_slice" BIGINT
);

Share link

Anyone who has the link will be able to view this.