Baseball Research submitted as part of the 2025 SMT Data Challenge, by myself and Tejas Rama
Brief summary of what we did:
- Engineered derived features (velocity, horizontal break, pre-pitch movement vectors) from 2D/3D MiLB player tracking data using the R tidyverse, arrow, and sportyr packages.
- Built logistic regression models to predict pitch type (fastball vs. offspeed) from fielder position, handedness, and movement, identifying 13 statistically significant positional tipping effects across 7 defensive positions.
- Designed faceted visualizations with ggplot2 showing fielders’ movement patterns by team and pitch type
- Proposed implications for advance scouting