Probability-Calibrated Ensemble Methods for Automotive CRM Lead Scoring
Main Article Content
Abstract
Accurately predicting sales conversion in automotive CRM systems is critical for optimizing marketing spend and sales team efficiency. This study presents a calibrated ensemble framework combining XGBoost, Gradient Boosting, and Random Forest classifiers to predict lead conversion probability in automotive dealership operations. Using 62,859 real-world leads collected between July 2024 and July 2025, we developed a systematic pipeline encompassing behavioral feature engineering, statistical feature selection, ensemble modeling, and probability calibration via Platt scaling. The calibrated ensemble achieved an AUC of 0.841, Brier score of 0.146, and 19% improvement in top-decile precision over baseline logistic regression. The framework provides actionable lead segmentation into four priority tiers, directly supporting sales resource allocation and marketing campaign optimization. Results confirm that probability calibration is essential for automotive CRM applications where predicted scores inform operational decisions.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
Agag, G., Aboul-Dahab, S., & El-Masry, A. A. (2024). Understanding the relationship between marketing analytics, customer agility, and customer satisfaction: A longitudinal perspective. Journal of Retailing and Consumer Services, 76, 103663. https://doi.org/10.1016/j.jretconser.2023.103663 DOI: https://doi.org/10.1016/j.jretconser.2023.103663
Basu, A., Bhattacharyya, S., & Shukla, V. K. (2023). Deep learning for information systems research. Journal of Management Information Systems, 40(1), 122–154. https://doi.org/10.1080/07421222.2023.2172772 DOI: https://doi.org/10.1080/07421222.2023.2172772
Berta, P., Bach, S., & Jordan, M. (2024). Classifier calibration with ROC-regularized isotonic regression. In Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024) (Vol. 238, pp. 3615–3623). PMLR.
Bohanec, M., Borštnar, M. K., & Robnik-Šikonja, M. (2017). Explaining machine learning models in sales predictions. Expert Systems with Applications, 71, 416–428. https://doi.org/10.1016/j.eswa.2016.11.010 DOI: https://doi.org/10.1016/j.eswa.2016.11.010
Sharma, K. K., Tomar, M., & Tadimarri, A. (2023). Optimizing sales funnel efficiency: Deep learning techniques for lead scoring. Journal of Knowledge Learning and Science Technology, 2(2), 261–274. https://doi.org/10.60087/jklst.vol2.n2.p274 DOI: https://doi.org/10.60087/jklst.vol2.n2.p274
D’Haen, J., & Van den Poel, D. (2013). Model-supported business-to-business prospect prediction based on an iterative customer acquisition framework. Industrial Marketing Management, 42(4), 544–551. https://doi.org/10.1016/j.indmarman.2013.03.005 DOI: https://doi.org/10.1016/j.indmarman.2013.03.006
Eitle, V., & Buxmann, P. (2019). Business analytics for sales pipeline management in the software industry: A machine learning perspective. In Proceedings of the 52nd Hawaii International Conference on System Sciences (HICSS) (pp. 1013–1022). https://doi.org/10.24251/HICSS.2019.125 DOI: https://doi.org/10.24251/HICSS.2019.125
González-Flores, K., Gil-García, C., & Arco-Tirado, J. L. (2025). The relevance of lead prioritization: A B2B lead scoring model based on machine learning. Frontiers in Artificial Intelligence, 8, 1554325. https://doi.org/10.3389/frai.2025.1554325 DOI: https://doi.org/10.3389/frai.2025.1554325
Gupta, A., & Ramdas, A. (2023). Online Platt scaling with calibeating. In Proceedings of the 40th International Conference on Machine Learning (ICML 2023) (Vol. 202, pp. 12182–12204). PMLR.
Hollebeek, L. D., Rather, R. A., Sigurdsson, V., & Bowden, J. L. (2024). Unravelling the customer journey: A conceptual framework and research agenda. Technological Forecasting and Social Change, 201, 123916. https://doi.org/10.1016/j.techfore.2024.123916 DOI: https://doi.org/10.1016/j.techfore.2024.123916
Järvinen, J., & Taiminen, H. (2016). Harnessing marketing automation for B2B content marketing. Industrial Marketing Management, 54, 164–175. https://doi.org/10.1016/j.indmarman.2015.07.002 DOI: https://doi.org/10.1016/j.indmarman.2015.07.002
Kapoor, S., & Narayanan, A. (2023). Leakage and the reproducibility crisis in machine-learning-based science. Patterns, 4(9), 100804. https://doi.org/10.1016/j.patter.2023.100804 DOI: https://doi.org/10.1016/j.patter.2023.100804
Kull, M., Perello-Nieto, M., Kängsepp, M., Silva Filho, T., Song, H., & Flach, P. (2019). Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration. Advances in Neural Information Processing Systems, 32, 12316–12326.
Kusnawi, Adiwijaya, & Gani, A. (2024). Leveraging various feature selection methods for churn prediction using various machine learning algorithms. JOIV: International Journal on Informatics Visualization, 8(2), 543–552. https://doi.org/10.62527/joiv.8.2.2453 DOI: https://doi.org/10.62527/joiv.8.2.2453
Lin, Q. (2025). Application of machine learning in predicting consumer behavior and precision marketing. PLOS ONE, 20(1), e0321854. https://doi.org/10.1371/journal.pone.0321854 DOI: https://doi.org/10.1371/journal.pone.0321854
Meire, M., Ballings, M., & Van den Poel, D. (2017). The added value of social media data in B2B customer acquisition systems: A real-life experiment. Decision Support Systems, 104, 26–37. https://doi.org/10.1016/j.dss.2017.10.003 DOI: https://doi.org/10.1016/j.dss.2017.09.010
Naeini, M. P., Cooper, G. F., & Hauskrecht, M. (2015). Obtaining well-calibrated probabilities using Bayesian binning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 29, No. 1, pp. 2901–2907). DOI: https://doi.org/10.1609/aaai.v29i1.9602
Ngai, E. W. T., Xiu, L., & Chau, D. C. K. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert Systems with Applications, 36(2), 2592–2602. https://doi.org/10.1016/j.eswa.2008.02.021 DOI: https://doi.org/10.1016/j.eswa.2008.02.021
Pineau, J., Vincent-Lamarre, P., Sinha, K., Larivière, V., Beygelzimer, A., d’Alché-Buc, F., Fox, E., & Larochelle, H. (2021). Improving reproducibility in machine learning research: A report from the NeurIPS 2019 Reproducibility Program. Journal of Machine Learning Research, 22(1), 7459–7478.
Sabnis, G., Chatterjee, S. C., Grewal, R., & Lilien, G. L. (2013). The sales lead black hole: On sales reps’ follow-up of marketing leads. Journal of Marketing, 77(1), 52–67. https://doi.org/10.1509/jm.10.0047 DOI: https://doi.org/10.1509/jm.10.0047
Säuberlich, F., Smith, K., & Yuhn, M. (2005). Analytical lead management in the automotive industry. In M. J. Shaw, D. D. Zeng, H. Chen, F. Y. Wang, & C. C. Yang (Eds.), Intelligence and Security Informatics (pp. 290–299). Springer. https://doi.org/10.1007/11427995_25 DOI: https://doi.org/10.1007/3-540-28397-8_32
Thorleuchter, D., Van Den Poel, D., & Prinzie, A. (2012). Analyzing existing customers’ websites to improve the customer acquisition process as well as the profitability prediction in business-to-business marketing. Expert Systems with Applications, 39(3), 2597–2603. https://doi.org/10.1016/j.eswa.2011.08.109 DOI: https://doi.org/10.1016/j.eswa.2011.08.115
Wu, M., Andreev, P., & Benyoucef, M. (2023). The state of lead scoring models and their impact on sales performance. Information Technology and Management, 24, 157–183. https://doi.org/10.1007/s10799-023-00388-w DOI: https://doi.org/10.1007/s10799-023-00388-w
Xiao, H., Huang, X., Peng, Y., & Li, J. (2025). Example dependent cost sensitive learning based selective deep ensemble model for customer credit scoring. Scientific Reports, 15(1), 89880. https://doi.org/10.1038/s41598-025-89880-7 DOI: https://doi.org/10.1038/s41598-025-89880-7