Predictive Failure Modes
This project tested my machine learing skills as well as my grit to continue and
finish the project.
Project Details
After acquiring a dataset, I embarked on a thorough exploration to identify which features were most
meaningful, both generally and for predicting asset failure. I employed various methods, including
statistical analysis, decision trees, visualization, and Principal Component Analysis (PCA). This
multifaceted approach yielded numerous valuable insights.
One major finding was the notable scarcity of specific failure modes and the
detailed operating conditions—speeds, temperatures, and times—associated with each failure mode. By
focusing on and filtering failures to the forefront, I was able to produce clear and impactful
visuals. This process revealed significant patterns that were previously obscured by the non-failure
data, providing a much clearer understanding of the factors leading to asset failure.
Once I had a thorough understanding of the dataset, I began building a predictive
model. Initially, I encountered concerning results as my model's recall dropped over multiple
epochs. This decline was due to the unbalanced nature of the dataset and the equal weighting between
failure and non-failure events.
To address this issue, I first attempted to adjust the weights to account for the
imbalance, but this approach felt "hacky" and involved manually determining weights through a
formula, which I found unsatisfactory. Consequently, I decided to bootstrap the dataset using a
tabular Generative Adversarial Network (GAN). By generating synthetic data similar to my existing
dataset, I was able to extend the dataset and improve its balance.
Using Keras tuners, I fine-tuned the model and ultimately achieved a recall of
approximately 95%. This approach not only enhanced the model's performance but also provided a more
robust solution to the dataset imbalance problem.