TWM: a framework for creating highly compressible videos targeted to computer vision tasks

We present a simple yet effective framework – Transmitting What Matters (TWM) – to generate highly compressible videos containing only relevant information targeted to specific computer vision tasks, such as faces for the task of face expression recognition, license plates for the task of optical character recognition, among others. TWM takes advantage of the final desired computer vision task to compose video frames only with the necessary data. The video frames are compressed and can be stored or transmitted to powerful servers where extensive and time-consuming tasks are performed. Experiments explore the trade-offs between distortion and bitrate for a wide range of compression levels, and the impact generated by compression artifacts on the accuracy of the desired vision task. We show that, for two computer vision tasks implemented by different methods, it is possible to dramatically reduce the amount of required data to be stored or transmitted, without compromising accuracy. With PSNR YUV quality of over 41 dB, the bitrate was reduced up to four times, while a detection task was affected by only ~ 1 pixel and a classification task by 1 ~ 2 percentage points.