Publications
Publications in reversed chronological order.
2024
- RIBAFS&P 500 stock selection using machine learning classifiers: A look into the changing role of factorsAntonio Caparrini López , Javier Arroyo, and Jordi Escayola MansillaResearch in International Business and Finance, 2024
This study examines the profitability of using machine learning algorithms to select a subset of stocks over the S&P 500 using factors as features. We use tree-based algorithms: Decision Tree, Random Forest, and XGBoost for their white model capabilities, allowing feature importances extraction. We defined a backtest to train the models with recent data and rebalance the portfolio. Despite incurring more risks, the selected assets of the portfolio outperform the index by using machine learning. Furthermore, we show that the feature importance that determines the best-performing assets changes at different times. Such models providing the evolution of the importance of factors can provide profitability insights while keeping explainability.
@article{Caparrini2024, title = {S&P 500 stock selection using machine learning classifiers: A look into the changing role of factors}, journal = {Research in International Business and Finance}, pages = {102336}, year = {2024}, issn = {0275-5319}, doi = {https://doi.org/10.1016/j.ribaf.2024.102336}, author = {Caparrini López, Antonio and Arroyo, Javier and Escayola Mansilla, Jordi}, keywords = {Stock selection, Machine learning, XGBoost, Factor inventing, Backtesting, Explainability} }
2020
- IEEE AccessExplainability of a Machine Learning Granting Scoring Model in Peer-to-Peer LendingMiller Janny Ariza-Garzón , Javier Arroyo, Antonio Caparrini, and 1 more authorIEEE Access, 2020
Peer-to-peer (P2P) lending demands effective and explainable credit risk models. Typical machine learning algorithms offer high prediction performance, but most of them lack explanatory power. However, this deficiency can be solved with the help of the explainability tools proposed in the last few years, such as the SHAP values. In this work, we assess the well-known logistic regression model and several machine learning algorithms for granting scoring in P2P lending. The comparison reveals that the machine learning alternative is superior in terms of not only classification performance but also explainability. More precisely, the SHAP values reveal that machine learning algorithms can reflect dispersion, nonlinearity and structural breaks in the relationships between each feature and the target variable. Our results demonstrate that is possible to have machine learning credit scoring models be both accurate and transparent. Such models provide the trust that the industry, regulators and end-users demand in P2P lending and may lead to a wider adoption of machine learning in this and other risk assessment applications where explainability is required.
@article{Ariza2020, author = {Ariza-Garzón, Miller Janny and Arroyo, Javier and Caparrini, Antonio and Segovia-Vargas, Maria-Jesus}, journal = {IEEE Access}, title = {Explainability of a Machine Learning Granting Scoring Model in Peer-to-Peer Lending}, year = {2020}, volume = {8}, number = {}, pages = {64873-64890}, keywords = {Machine learning;Logistics;Decision trees;Peer-to-peer computing;Machine learning algorithms;Analytical models;Neural networks;Credit risk;P2P lending;explainability;Shapley values;boosting;logistic regression}, doi = {10.1109/ACCESS.2020.2984412}, } - JNMSAutomatic subgenre classification in an electronic dance music taxonomyAntonio Caparrini, Javier Arroyo, Laura Pérez-Molina , and 1 more authorJournal of New Music Research, 2020
Electronic dance music (EDM) is a genre where thousands of new songs are released every week. The list of EDM subgenres considered is long, but it also evolves according to trends and musical tastes. With this in view, we have retrieved two sets of over 2000 songs separated by more than a year. Songs belong to the top 100 list of an EDM website taxonomy of more than 20 subgenres that changed in the period considered. We test the effectiveness of automatic classification on these sets and delve into the results to determine, for example, which subgenres perform better and worse, how the performance of some subgenres change in the two sets, or how some subgenres are often confused with one another. We illustrate confusion among subgenres by a graph and interpret it as a taxonomic map of EDM. We also assess the deterioration of the performance of the classifier of the first set when used to classify the second one. Finally, we study how the new subgenres that appear in the second set relate to the old ones with the help of the classifier of the first set. As a result, this work illustrates the main challenges that EDM poses to automatic classification and provides insights into where are the limits of this approach.
@article{Caparrini2020, author = {Caparrini, Antonio and Arroyo, Javier and Pérez-Molina, Laura and Sánchez-Hernández, Jaime}, title = {Automatic subgenre classification in an electronic dance music taxonomy}, journal = {Journal of New Music Research}, volume = {49}, number = {3}, pages = {269-284}, year = {2020}, publisher = {Routledge}, doi = {10.1080/09298215.2020.1761399}, eprint = {https://doi.org/10.1080/09298215.2020.1761399}, }