Publications

2026

mloptimizer: Genetic algorithm-based hyperparameter optimization for machine learning models in python A. Caparrini, J. Arroyo SoftwareX, 2026 DOI: 10.1016/j.softx.2026.102567

Abstract

mloptimizer is a Python library that provides genetic algorithm-based hyperparameter optimization for scikit-learn compatible machine learning models. It is designed to integrate seamlessly with the scikit-learn API and supports custom fitness functions, allowing users to optimize for any performance metric. Published in SoftwareX as part of the scikit-learn-contrib ecosystem.

2024

Profit-sensitive machine learning classification with explanations in credit risk: The case of small businesses in peer-to-peer lending M.J. Ariza-Garzón, J. Arroyo, M.J. Segovia-Vargas, A. Caparrini Electronic Commerce Research and Applications, 2024 DOI: 10.1016/j.elerap.2024.101428

Abstract

This paper proposes a comprehensive profit-sensitive approach for credit risk modeling in P2P lending for small businesses, one of the most financially complex segments. We go beyond traditional and cost-sensitive approaches by including the financial costs and incomes through profits and introducing the profit information at three points of the modeling process: the estimation of the learning function of the classification algorithm, the hyperparameter optimization, and the decision function. The profit-sensitive approaches achieve a higher level of profitability than the profit-insensitive approach by granting mostly lower-risk, lower-amount loans. Explainability tools help us to discover the key features of such loans.

S&P 500 stock selection using machine learning classifiers: A look into the changing role of factors A. Caparrini, J. Arroyo, J. Escayola Mansilla Research in International Business and Finance (RIBAF), 2024 DOI: 10.1016/j.ribaf.2024.102336

Abstract

This study examines the profitability of using machine learning algorithms to select a subset of stocks over the S&P 500 using factors as features. We use tree-based algorithms: Decision Tree, Random Forest, and XGBoost for their white model capabilities, allowing feature importances extraction. We defined a backtest to train the models with recent data and rebalance the portfolio. Despite incurring more risks, the selected assets of the portfolio outperform the index by using machine learning. Furthermore, we show that the feature importance that determines the best-performing assets changes at different times. Such models providing the evolution of the importance of factors can provide profitability insights while keeping explainability.

2020

Explainability of a Machine Learning Granting Scoring Model in Peer-to-Peer Lending M.J. Ariza-Garzón, J. Arroyo, A. Caparrini, M.J. Segovia-Vargas IEEE Access, vol. 8, pp. 64873–64890, 2020 DOI: 10.1109/ACCESS.2020.2984412

Abstract

Peer-to-peer (P2P) lending demands effective and explainable credit risk models. Typical machine learning algorithms offer high prediction performance, but most of them lack explanatory power. However, this deficiency can be solved with the help of the explainability tools proposed in the last few years, such as the SHAP values. In this work, we assess the well-known logistic regression model and several machine learning algorithms for granting scoring in P2P lending. The comparison reveals that the machine learning alternative is superior in terms of not only classification performance but also explainability. More precisely, the SHAP values reveal that machine learning algorithms can reflect dispersion, nonlinearity and structural breaks in the relationships between each feature and the target variable. Our results demonstrate that is possible to have machine learning credit scoring models be both accurate and transparent.

Automatic subgenre classification in an electronic dance music taxonomy A. Caparrini, J. Arroyo, L. Pérez-Molina, J. Sánchez-Hernández Journal of New Music Research, vol. 49, no. 3, pp. 269–284, 2020 DOI: 10.1080/09298215.2020.1761399

Abstract

Electronic dance music (EDM) is a genre where thousands of new songs are released every week. The list of EDM subgenres considered is long, but it also evolves according to trends and musical tastes. With this in view, we have retrieved two sets of over 2000 songs separated by more than a year. Songs belong to the top 100 list of an EDM website taxonomy of more than 20 subgenres that changed in the period considered. We test the effectiveness of automatic classification on these sets and delve into the results to determine which subgenres perform better and worse, how the performance of some subgenres change in the two sets, or how some subgenres are often confused with one another. We illustrate confusion among subgenres by a graph and interpret it as a taxonomic map of EDM.