Baselight

Breast Cancer Prediction

@kaggle.fatemehmehrparvar_breast_cancer_prediction

About this Dataset

Breast Cancer Prediction

Description

Research Hypothesis: This study hypothesizes that there are significant associations between the diagnostic characteristics of patients, including age, menopause status, tumor size, presence of invasive nodes, affected breast, metastasis status, breast quadrant, history of breast conditions, and their breast cancer diagnosis result. Data Collection and Description:The dataset of 213 patient observations was obtained from the University of Calabar Teaching Hospital cancer registry over 24 months (January 2019–August 2021). The data includes eleven features: year of diagnosis, age, menopause status, tumor size in cm, number of invasive nodes, breast (left or right) affected, metastasis (yes or no), quadrant of the breast affected, history of breast disease, and diagnosis result (benign or malignant).Notable Findings:Upon preliminary examination, the data shows variations in diagnosis results across different patient features. A noticeable trend is the higher prevalence of malignant results among patients with larger tumor sizes and the presence of invasive nodes. Additionally, postmenopausal women seem to have a higher rate of malignant diagnoses.Interpretation and Usage:The data can be analyzed using statistical and machine learning techniques to determine the strength and significance of associations between patient characteristics and breast cancer diagnosis. This can contribute to predictive modeling for the early detection and diagnosis of breast cancer.However, the interpretation must consider potential limitations, such as missing data or bias in data collection. Furthermore, the data reflects patients from a single hospital, limiting the generalizability of the findings to wider populations.The data could be valuable for healthcare professionals, researchers, or policymakers interested in understanding breast cancer diagnosis factors and improving healthcare strategies for breast cancer. It could also be used in patient education about risk factors associated with breast cancer.

About Dataset

  • S/N = Unique identification for each patient.

  • Year=The year diagnosis was conducted.

  • Age = Age of patient at the time of diagnose.

  • Menopause = Whether the patient is pro or postmenopausal at the time diagnose,0 MEANS THAT THE PATIENT HAS REACHED MENOPAUSE WHILE 1 MEANS THAT THE PATIENT HAS NOT REACHED MENOPAUSE YET.

  • Tumor size = The size in centimeter of the excised tumor.

  • Involved nodes = The number of axillary lymph nodes that contain metastatic,"CODED AS A BINARY DISTRI UTION OF EITHER PRESENT OR ASENT. 1 MEANS PRESENT, 0 MEANS ABSENT."

  • Breast = If it occurs on the left or right side,"CODED AS A BINARY DISTRIBUTION 1 MEANS THE CANCER HAS SPREAD, 0 MEANS IT HASN'T SPREAD YET."

  • Metastatic = If the cancer has spread to other part of the body or organ.

  • Breast quadrant = The gland is divided into 4 sections with nipple as a central point.

  • History = If the patient has any history or family history on cancer,"1 means there is a history of cancer , 0 means no history."

  • Diagnosis result = Instances of the breast cancer dataset.

Share link

Anyone who has the link will be able to view this.