Waterloo Exploration Database: New Challenges
for Image Quality Assessment Models


The great content diversity of real-world digital images poses a grand challenge to image quality assessment (IQA) models, which are traditionally designed and validated on a handful of commonly used IQA databases with very limited content variation. To test the generalization capability and to facilitate the wide usage of IQA techniques in real-world applications, we establish a large-scale database named the Waterloo Exploration database, which in its current state contains 4,744 pristine natural images and 94,880 distorted images created from them. Instead of collecting the mean opinion score for each image via subjective testing, which is extremely difficult if not impossible, we present three alternative test criteria to evaluate the performance of IQA models, namely the pristine/distorted image discriminability test (D-test), the listwise ranking consistency test (L-test), and the pairwise preference consistency test (P-test). We compare 20 well-known IQA models using the proposed criteria, which not only provide a stronger test in a more challenging testing environment for existing models, but also demonstrate the additional benefits of using the proposed database. For example, in the P-test, even for the best performing no-reference IQA model, more than 6 million failure cases against the model are "discovered" automatically out of over 1 billion test pairs. Furthermore, we discuss how the new database may be exploited using innovative approaches in the future, to reveal the weaknesses of existing IQA models, to provide insights on how to improve the models, and to shed light on how the next-generation IQA models may be developed.

	author    = {Ma, Kede and Duanmu, Zhengfang and Wu, Qingbo and Wang, Zhou and Yong, Hongwei and Li, Hongliang and Zhang, Lei}, 
	title     = {{Waterloo Exploration Database}: New Challenges for Image Quality Assessment Models}, 
	journal   = {IEEE Transactions on Image Processing},
	volume    = {22},
	number    = {2},
	pages     = {1004--1016},
	year      = {2017}}

- Evaluated Algorithms
Algorithm Reference
PSNR Peak signal to noise ratio
SSIM Wang et al. Image quality assessment: from error visibility to structural similarity. TIP. 2004.
MS-SSIM Wang et al. Multi-scale structural similarity for image quality assessment. Asilomar. 2003.
VIF Sheikh et al. Image information and visual quality. TIP. 2006.
FSIM Zhang et al. A feature similarity index for image quality assessment. TIP. 2011.
GMSD Xue et al. Gradient magnitude similarity deviation: a highly efficient perceptual image quality index. TIP. 2014.
WANG05 Wang et al. Reduced-reference image quality assessment using a wavelet-domain natural image statistic model. HVEI. 2005.
RRED Soundararajan et al. RRED indices: Reduced reference entropic differencing for image quality assessment. TIP. 2012.
BIQI Moorthy et al. A two-step framework for constructing blind image quality indices. SPL. 2010.
BLINDS-II Saad et al. Blind image quality assessment: a natural scene statistics approach in the DCT domain. TIP. 2012.
BRISQUE Mittal et al. No-reference image quality assessment in the spatial domain. TIP. 2012.
CORNIA Ye et al. Unsupervised feature learning framework for no-reference image quality assessment. CVPR. 2012.
DIIVINE Moorthy et al. Blind image quality assessment: from scene statistics to perceptual quality. TIP. 2011.
IL-NIQE Zhang et al. A feature-enriched completely blind image quality evaluator. TIP. 2015.
LPSI Wu et al. A highly efficient method for blind image quality assessment. ICIP. 2015.
M3 Xue et al. Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features. TIP. 2014.
NFERM Gu et al. Using free energy principle for blind image quality assessment. TMM. 2015.
NIQE Mittal et al. Making a completely blind image quality analyzer. SPL. 2013.
QAC Xue et al. Learning without human scores for blind image quality assessment. CVPR. 2013.
TCLT Wu et al. Blind image quality assessment based on multichannel features fusion and label transfer. TCSVT. 2016.

- Performance Comparison