Evaluating large-scale propensity score performance through real-world and synthetic data experiments