Optimizing TensorFlow Training Time for Better Performance

Maximize your accelerator utilization for reduced training time to minimize costs and improve metrics across the board

Sascha Heyer
5 min readAug 20, 2021

With different methods and profiling, we can achieve a significantly higher training throughput. The topics discussed in this guide are focused on TensorFlow. But no worries. Almost all of these optimizations and methods also exist for PyTorch. Let’s begin.

Numbers don’t lie. The following graph is an example of what is possible. Without optimization, the training is processing around 60 images per second compared to almost 2000 images per second when optimized. This optimization can be applied to any kind of deep learning problem.

Training optimization possibilities based on an image example — Author: Sascha Heyer

This is a road you have to follow from left to right. You can’t just simply run your training operations for your GPUs and TPUs in f.loat-16. It will also require an efficient data pipeline. If you plan to optimize your training based on this guide, take the steps from left to right.

We can optimize both training and inference. However, this article focuses solely on optimizing training time performance. If you are interested in how to optimize inference let me know in the comments below or via a social media channel of your preference. Without…

--

--

Sascha Heyer

Hi, I am Sascha, Senior Machine Learning Engineer at @DoiT. Support me by becoming a Medium member 🙏 bit.ly/sascha-support