Cardiovascular disease prediction
Scikit-learn · PyTorch · Cleveland Clinic dataset
First real ML research project. A binary classifier over the Cleveland Clinic dataset, predicting cardiovascular disease from a small set of patient features. The interesting work was less the model and more the surrounding craft: train/test splitting, feature engineering, calibration, and learning to write evaluation that doesn't lie to you. Final model reached 96% test accuracy.
The project
A binary classifier over the Cleveland Clinic dataset, predicting cardiovascular disease from a small set of patient features (age, sex, chest pain type, resting BP, cholesterol, fasting blood sugar, ST depression, etc.). My first real ML research project.
What it taught me
Less the model and more the surrounding craft: train/test splitting, feature engineering, calibration, ablations, and learning to write evaluation that doesn't lie to you. The final 96% test accuracy was nice, but the thing I actually walked away with was a healthier suspicion of any single number.
Stack
- Scikit-learn for the classical baselines (logistic regression, SVM, random forest).
- PyTorch for a small MLP comparison.
- Pandas / NumPy / Matplotlib for the rest.
- Category
- Earlier work
- Period
- 2023
- Supervisor
- Kyungdong University
- Dataset
- Cleveland Clinic Cardiovascular Disease