This dataset is just a processed version from DICOM to NIFTI of the Breast Cancer MRI dataset by Duke University.
Updated 23 October 2023
- Altered the saved format : now uses img.gz instead of .nii.gz
- Fixed reorientation of DICOM to NIFTI (now in 1:1 correspondence with the originally supplied annotation boxes)
- Segmentation masks are more in line with the tumours
- Pyradiomics extraction amended
Preprocessing steps involved:
- Processed individual DICOM slices using SimpleITK. The resultant file format uses an img.gz version that is supported by Pyradiomics. The layout is [slice, height, width] in the numpy array obtained for each sequence. This also eliminated the need for altering the bounding boxes.
- Selected Pre and Post-Contrast (Post_1 sequence) for each patient.
- Used Otsu thresholding of post-1 sequences for automated segmentation of the 3D volume. The supplied lesion bounding boxes were used to only keep the segmentation within the bounding box.
- The post-1 sequence and the segmentation masks of the lesions to extract features from the MRI sequences using Pyradiomics. Mask checking was enabled in Pyradiomics.
Reference for the original dataset:
Saha, A., Harowicz, M.R., Grimm, L.J., Kim, C.E., Ghate, S.V., Walsh, R. and Mazurowski, M.A., 2018. A machine learning approach to radiogenomics of breast cancer: a study of 922 subjects and 529 DCE-MRI features. British journal of cancer, 119(4), pp.508-516.
A free version of this paper is available here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6134102/.