Volleyball Rally Outcome Prediction

For my APS360 (Applied Fundamentals of Deep Learning) final project, I trained a model to predict which team wins a volleyball rally from the sequence of actions within it: serve, reception, set, attack, and block.

Because a rally is inherently sequential (each contact constrains the next), I framed it as a sequence classification problem and built a two-layer LSTM with learned embeddings for each action type. I trained it on the VREN dataset of ~1,500 annotated rallies from professional and NCAA Division I games, with a feedforward network as a baseline to beat.

The results:

The LSTM hit 90.7% test accuracy and a 0.953 AUC, a 20-point jump over the MLP baseline (69.9%).
A team-swap data augmentation (mirroring team A and team B) was the single most impactful change, shrinking the train–validation gap from ~15 points to ~2 and fixing the overfitting that plagued my first version, all while using ~3× fewer parameters.

To check that the model learned real volleyball patterns rather than memorizing the dataset, I hand-annotated 30 rallies from the 2025 NCAA Division I Men's Final (Long Beach State vs. UCLA), data the model had never seen. It held up at 80% accuracy and a 0.891 AUC, which I was really happy with given the different teams, annotation style, and tiny sample.

This was a solo project, and it was a great deep dive into sequence modeling, regularization in small-data regimes, and how much careful data work matters compared to raw model size.