Towards Safer Strength Training with Deep Learning for Rep Failure Prediction
TL;DR: Alongside Caterina Mammola ↗, we explored whether strength training could be made safer. We show that a deep learning model can predict, with reasonable accuracy, when a rep in a bicep curl set is likely to reach failure. This approach could be extended to exercises where muscular failure is riskier, helping reduce injury during unsupervised workouts.
In exercises like the bench press, reaching failure without a spotter can be dangerous. A system that can recognise the final safe rep could make solo strength training safer, and in lower-risk settings it could also push people to train harder by showing that they still have reps in reserve.
We created a novel dataset of over 3,200 bicep curl reps performed to failure and trained a Hierarchical LSTM that outperforms a linear baseline by nearly 30% at identifying the final safe rep. The model analyses joint motion patterns such as range of motion and velocity across reps, inspired by Remaining Useful Life (RUL) prediction from engineering.



To complement the paper, here are a few figures and videos that did not make it into the final version.
The project had four main stages: record sets to failure, extract pose from video, segment the motion into individual reps, and train a sequential model to estimate how close each rep is to failure.
Our approach treats rep failure as a boundary detection problem. We reframe it through the lens of Remaining Useful Life (RUL) prediction, adapting pose-based LSTMs to identify subtle biomechanical cues that precede failure.
We first run a pose estimation model on the video to extract the elbow angle over time.
We then smooth the elbow angle signal to remove noise and identify the points that define the start and end of each rep.
While tuning the parameter did not produce fully reliable results, especially for sets with long pauses or irregular reps, a rule-based approach worked well in practice.
To get participants to perform the reps, we used highly sophisticated marketing techniques.
(In the small print, we explain that each set of reps counts as a ticket in a £40 prize draw, and that more sets mean more tickets.)
One of the most difficult parts of the project was not the model itself, but building the dataset. Unlike many student ML projects, there was no suitable public dataset waiting for us. We needed videos of people performing consistent bicep curl sets all the way to failure, recorded cleanly enough for pose estimation to work reliably.
We also needed ethics approval before collecting data from participants, which took longer than we expected and compressed the project timeline quite a bit. By the time everything was approved, the amount of time left to collect data, build the pipeline, train the model, and write the paper was suddenly much smaller than planned.
To actually gather the data, we set up a small recording station in Appleton Tower with adjustable dumbbells, consent forms, phones for recording, and a simple process for moving people through quickly. What we expected to be a slow trickle of participants turned into queues, with some people returning multiple times throughout the day to improve their chances in the prize draw.
We gathered 254 unique sets from 66 participants, totalling 3,272 reps. A sizeable dataset, though still small compared to typical benchmarks in the literature.
The grid shows a small subset of reps from different sets, alongside the features extracted from them.
What followed was a large amount of iteration on deep learning approaches to improve performance. We also developed a linear baseline using the same features for comparison, which gave us a clearer sense of how much value the sequential model was actually adding.
The final model showed encouraging signal, but the dataset size still imposed a clear ceiling on performance. If we were to continue this work, the next steps would be collecting more data, improving consistency in capture conditions, and expanding beyond bicep curls to exercises where rep failure has more serious safety implications.
Even so, the project answered the question we started with: there is enough signal in human motion to make rep failure prediction plausible!
Sincerely,
Tomas