Title : Application of a machine learning-based tumor risk prediction system to assist in precise diagnosis of prostate cancer and biopsy decision-making
Abstract:
Objective: To develop a prediction model for prostate biopsy based on machine learning algorithms and create a tumor risk prediction system, aiming to achieve precise diagnosis of prostate cancer and assist in biopsy decision-making.
Methods: Clinical data were retrospectively collected from 2007 patients suspected of prostate cancer who underwent fusion-guided targeted + systematic prostate biopsy at our hospital between January 2018 and December 2024. Data included age, digital rectal examination findings, serum total PSA, free PSA, f/t ratio, lower urinary tract symptoms, laboratory results for urinary tract infection, ultrasound-measured prostate volume (PV), PSA density (PSAD), multiparametric MRI PI-RADS score, and pathological diagnosis. Based on pathological results, binary outcomes were defined as prostate cancer (PCa) vs. non-cancerous lesions (nPCa), or clinically significant prostate cancer (csPCa) vs. other lesions. Various machine learning algorithms, including Random Forest, XGBOOST, SVM, LGBM, and Neural Networks, were used for modeling and performance comparison. The optimal algorithm was selected to develop a prostate cancer risk prediction system. This system was then used to predict the prostate cancer risk in 100 newly diagnosed patients in 2025, and its utility was evaluated based on biopsy results.
Results: The overall PCa detection rate in the sample was 37.07% (744 cases), and the csPCa detection rate was 25.71% (516 cases). Analysis using multiple machine learning algorithms identified PI-RADS score, PV, and PSAD as the most important variables in the models. Comparing the performance of different models in diagnosing PCa, Random Forest had the highest RECALL (0.80), Neural Network had the highest accuracy (0.816), and SVM had the largest AUC (0.834). For diagnosing csPCa, Random Forest demonstrated the best overall performance (RECALL 0.85, accuracy 0.853, AUC=0.906). Considering the clinical needs for prostate cancer diagnosis and treatment, the Random Forest model was used to develop the tumor risk prediction system (Software Name: Prostate Cancer Risk Calculator, Registration No. 2024SR0283311). Applied to 100 initial diagnosis patients in 2025, the system achieved a PCa prediction accuracy of 79% and a csPCa prediction accuracy of 87%. Using a diagnostic threshold of >0.5 as an indication for biopsy could potentially help 83% of patients suspected of prostate cancer avoid unnecessary biopsies.
Conclusion: The prediction model based on the Random Forest algorithm can significantly improve the accuracy of prostate cancer diagnosis and assist in prostate biopsy decision-making. Applying the tumor risk prediction system developed from this model in initial diagnosis patients can improve the detection rate of prostate cancer, especially clinically significant cancer, while avoiding unnecessary biopsies.

