Skip to content

Image Metric Performance

In this section, we measure the performance of the image metrics. All computations are performed on the same instance with a constant hardware configuration. No CUDA acceleration is used.

Input Validation

To validate the performance of the image metrics, we measured the mean batch time for a large image dataset. A total of 5000 RGB images with a resolution of 512x512 were randomly generated for the purposes of this experiment. We split the computation into batches of 50 images at a time, and measured the mean batch time, total time, and peak memory usage for each metric.

Metric Mean Batch Time (s) Total Time (s) Mean Peak Memory (GiB) Max Peak Memory (GiB)
BRISQUE 2.02 ± 1.17 201.91 1.14 ± 0.00 1.15
Brightness 0.27 ± 0.01 27.25 1.11 ± 0.00 1.11
CLIPIQA 13.65 ± 0.53 1364.88 4.03 ± 0.03 4.05
Colorfulness 0.20 ± 0.01 20.48 1.12 ± 0.00 1.12
DOM 8.51 ± 0.57 850.74 1.09 ± 0.00 1.09
EME 0.19 ± 0.00 19.39 1.12 ± 0.00 1.12
ExposureBrightness 0.19 ± 0.00 18.51 1.12 ± 0.00 1.12
MSSSIM 3.74 ± 0.88 374.33 3.09 ± 0.09 3.27
PSNR 0.38 ± 0.01 37.79 0.64 ± 0.00 0.64
SSIM 2.21 ± 0.46 220.83 3.10 ± 0.09 3.26
Tenengrad 0.21 ± 0.00 20.82 1.12 ± 0.00 1.12
TenengradRelative 0.92 ± 0.01 91.55 1.12 ± 0.00 1.12

Synthesis Validation

In these experiments we measure the performance of the image metrics on synthetic data. We generated random reference and target embeddings of size 50000x2048 to mimic the embedding size of the Inception V3 model on two sets of 50000 images. We measure the total execution time of the metrics executed individually on the random embeddings, as well as the peak memory usage of the metrics. The measurments were repeated 5 times for each metric and then averaged.

Metric Total Time (s) Peak Memory (GiB)
Authenticity 145.84 ± 4.95 24.20 ± 0.06
Coverage 151.65 ± 4.20 24.15 ± 0.03
Density 149.03 ± 4.07 24.24 ± 0.06
FrechetDistance 16.05 ± 0.46 3.34 ± 0.01
GIQA 42.66 ± 2.29 2.75 ± 0.00
ImprovedPrecision 154.78 ± 8.08 24.44 ± 0.08
ImprovedRecall 148.06 ± 5.40 24.43 ± 0.10
MultiScaleIntrinsicDistance 570.26 ± 53.03 2.54 ± 0.00
PrecisionRecallDistribution 81.97 ± 9.84 4.37 ± 0.04

Note: K-NN-based metrics such as ImprovedPrecision and ImprovedRecall share intermediate computations. In these experiments, each metric was computed independently.\ However, in practice, you can significantly reduce execution time by reusing these shared computations via the context argument.