Conference paper at IEEE IT2026 on intepretable ML for diabetes screening

Our team presented a paper titled “Interpretable ML for Diabetes and Prediabetes Screening Using Self-Reported Health Indicators” by S. Lazic, S. Cakic, I. Rubezic Lukic, N. Popovic, and T. Popovic at the 30. Annual Conferenc on Information Technology IT 2026. This was part of mentoring activities and efforts related to development of young researchers.

The paper was presented at the conference by Ms. Sanja Lazic (MSc candidate)

ABSTRACT – Early identification of type 2 diabetes (T2D) and prediabetes enables timely interventions, yet screening often relies on self-reported data rather than laboratory testing. This work compares lightweight Machine Learning (ML) models: Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Multilayer Perceptron (MLP) trained on 21 self-reported indicators from the 2015 Behavioral Risk Factor Surveillance System (BRFSS) dataset for three-class classification (no diabetes, prediabetes, diabetes). We propose a screening-oriented evaluation where a probability threshold is selected to achieve a target sensitivity (recall) of 0.80. LightGBM achieves balanced accuracy of 0.52 and precision of 0.33 at the target sensitivity, with 38% of cases flagged. Tree SHapley Additive exPlanations (TreeSHAP) highlight general health status, age category, body mass index (BMI), and hypertension as dominant predictors. A FastAPI web application provides individual risk estimates and instance-level explanations. The pipeline demonstrates feasibility of interpretable, calibrated screening from non-laboratory data.