Data-Driven Pharmaceutical Pricing in Developing Markets: A Machine Learning Approach
Abstract
Pharmaceutical pricing in developing markets is often affected by limited transparency, affordability constraints, uneven product availability, and inefficient pricing structures. These challenges complicate evidence-based pricing decisions for pharmaceutical companies, healthcare providers, and policymakers. This study develops a machine learning framework for predicting pharmaceutical prices in the Pakistani market using product-level variables, including company name, pack size, pre-discount price, discount percentage, and product availability. The dataset was obtained from a publicly available Kaggle dataset and included 1,630 pharmaceutical product records. Before model development, exploratory data analysis and preprocessing were conducted to examine price distribution, high-value observations, pack-size variability, missing values, and categorical features. The final post-discount price was used as the target variable.
The study compares the predictive performance of individual machine learning models, including XGBoost, Random Forest, Linear Regression, and Feedforward Neural Networks, with ensemble learning approaches based on stacking and blending. The dataset was divided into a model development subset and an unseen holdout sample. Model performance was evaluated using Mean Absolute Error, Root Mean Squared Error, and Symmetric Mean Absolute Percentage Error. The results indicate that XGBoost performed strongly among the individual models due to its ability to capture nonlinear relationships and feature interactions. Among the ensemble approaches, the blending model achieved slightly better predictive performance than stacking, suggesting its potential for improving prediction stability and generalizability.
In addition to predictive accuracy, the study examined feature importance to enhance model interpretability. Product availability, discount percentage, and pack size were identified as important factors influencing pharmaceutical price prediction. These findings suggest that machine learning can support more transparent and data-driven pricing decisions in developing markets. The study contributes to the pharmaceutical pricing literature by comparing individual and ensemble machine learning methods in a developing-market context and linking predictive modeling with practical pricing insights for companies and policymakers.
Keywords: pharmaceutical pricing; machine learning; ensemble learning; XGBoost; Random Forest; pharmaceutical market; Pakistan.
Full Text:
PDFReferences
Dong Y, Yang T, Xing Y, Du J, Meng Q (2023) Data-driven modeling methods and techniques for pharmaceutical processes. Processes 11(7):2096.
Bhattamisra SK, Banerjee P, Gupta P, Mayuren J, Patra S, Candasamy M (2023) Artificial intelligence in pharmaceutical and healthcare research. Big Data Cogn Comput 7(1):10.
Javid S, Rahmanulla A, Ahmed MG, Sultana R, Prashantha Kumar BR (2025) Machine learning & deep learning tools in pharmaceutical sciences: A comprehensive review. Intelligent Pharmacy 3(3):167-180. https://doi.org/10.1016/j.ipha.2024.11.003
Abdel Rida N, Ibrahim MIM, Babar ZUD, Owusu Y (2017) A systematic review of pharmaceutical pricing policies in developing countries. J Pharm Health Serv Res 8(4):213-226. https://doi.org/10.1111/jphs.12191
Dean EB (2019) Who benefits from pharmaceutical price controls? Evidence from India. CGD Working Paper 509. Center for Global Development, Washington, DC.
Danzon PM, Chao LW (2000) Cross-national price differences for pharmaceuticals: How large, and why? J Health Econ 19(2):159-195.
Bate R, Jin GZ, Mathur A (2011) Does price reveal poor-quality drugs? Evidence from 17 countries. J Health Econ 30(6):1150-1163.
Santos MAB, Dias LLS, Pinto CDBS, Silva RM, Osorio-de-Castro CGS (2019) Factors influencing pharmaceutical pricing: A scoping review of academic literature in health science. J Pharm Policy Pract 12:24. https://doi.org/10.1186/s40545-019-0183-0
Janssen Daalen JM, den Ambtman A, van Houdenhoven M, van den Bemt BJF (2021) Determinants of drug prices: A systematic review of comparison studies. BMJ Open 11(7):e046917. https://doi.org/10.1136/bmjopen-2020-046917
Lee KS, Kassab YW, Taha NA, Zainal ZA (2021) A systematic review of pharmaceutical price mark-up practice and its implementation. Explor Res Clin Soc Pharm 2:100020.
Lee KS, Kassab YW, Taha NA, Zainal ZA (2021) Factors impacting pharmaceutical prices and affordability: Narrative review. Pharmacy 9(1):1.
Breiman L (2001) Random forests. Mach Learn 45(1):5-32.
Polikar R (2012) Ensemble learning. In: Zhang C, Ma Y (eds) Ensemble Machine Learning: Methods and Applications. Springer US, New York, pp 1-34. https://doi.org/10.1007/978-1-4419-9326-7_1
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241-259. https://doi.org/10.1016/S0893-6080(05)80023-1
Breiman L (1996) Stacked regressions. Mach Learn 24(1):49-64.
Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 785-794.
Kumar M, Nguyen TPN, Kaur J, Singh TG, Soni D, Singh R, Kumar P (2023) Opportunities and challenges in application of artificial intelligence in pharmacology. Pharmacol Rep 75(1):3-18.
Refbacks
- There are currently no refbacks.


