Optimizing Image Optimization with Machine Learning

At wao.io we are, among other things, concerned about the time customers wait for your website. One key factor to speed things up is the amount of data that needs to be transferred – the best bytes are those which need not to be sent. Because images make up a large fraction of the transmitted data we transcode images before they are delivered to browsers. To make sure that the transcoding does not affect the image quality, we employ Automatic Image Quality Assessment (AIQA) procedure.

To further improve the efficiency of the transcoding, we combine the transcoding with a machine learning pipeline and see a significant decrease of data that is transmitted.

We measure the perceived difference between the original image and its compressed version using an enhanced structural dissimilarity index (DSSIM). Each image quality preset of wao.io (balanced, small size, best quality) corresponds to an acceptable DSSIM threshold.

Our transcoders are controlled by quality parameters, which determine how many bytes are used to encode the image features. Larger values for the transcoder’s quality parameter result in smaller dissimilarity but at the same time also in a larger file, as you can see in the sketch above.

Typical dependency of image file size (bytes) and perceived image difference (DSSIM) on transcoder quality parameters.

So we need to know for every image the optimal transcoder parameters which give us the smallest file size while staying below the DSSIM threshold. The easiest way is to try all possible parameter values for each image. However, this simply takes too long.

Improving the image optimization by adding machine learning

Next, we could use a fixed value for all images and deliver the transcoded image if the resulting DSSIM is below the threshold and the original otherwise. This will be our benchmark. The downside of this simple approach is that some images will be lost because the DSSIM at the pre-selected value is above the threshold and for others, the pre-selected value is too conservative yielding a DSSIM much larger images than possible. Nevertheless, we realize 59.8% of the potential savings. The potential savings depend on the DSSIM threshold. For wao.io’s balanced preset, this corresponds to saving 56% of the original image file sizes, i.e. we can reduce the file sizes by 33% compared to the originals.

Others have used a similar approach and employed binary search to scan the potential JPEG quality range and gained an overall data reduction of 30%.

Finally, we fitted two common machine-learning (ML) models using scikit-learn

  • a flat decision tree (DT)
  • a random forest

We used only the image dimensions and bytes/pixel as features because these are readily available. The achieved savings of all four methods are listed in the table below.

Method Achieved Savings [%]
fixed parameter 59.8
1-step Newton 79.2
flat DT 86.9
random forest 89.0

As you can see, the simple “fixed-parameter” benchmark achieves almost 60% of the potential savings. The 1-step Newton approach gets us to about 80% but requires twice as much CPU time. The two ML models outperform the 1-step Newton approach by roughly 10% and require only one transcoding run, which again halves the necessary CPU time. Interestingly, the flat DT yields almost the same achieved saving as the random forest, but with lesser complexity.

At the moment, wao.io is using the 1-step Newton approach reaching about 80% of the potential savings, which corresponds to reducing the delivered image data by 44% on average without affecting the perceived quality. In the future, we will switch to a more advanced method – and keep you posted.

This article was written by Felix Bahr and Eike von Seggern
(46 claps)