MMSP Challenge on
Structure-Guided Image Inpainting


  • September 23, 2021. Our session is schedued to be held on October 8th at 16:30 Finland time UTC+3 (which is 21:30 Beijing UTC+8 and 06:30 Seattle UTC-7 on the same day).
  • September 9, 2021. Please refer to the submission guidelines to submit your results.
  • August 13, 2021. You may refer to the FAQ page for more information about the challenge.
  • July 19, 2021. The deadline to submit the final testing result has been extended to Sept. 14, 2021.
  • June 29, 2021. The final testing data will be available on July 1st. Please fill in the registration form to acquire the the testing data for final submission. If you encounter any problem, please contact huyy AT pku dot edu dot cn.
  • June 19, 2021. The validation data is now available at link.

Invited Keynote Speaker

Tianfan Xue

Dr. Tianfan Xue is a researcher in the computational photography team in Google Research. He received his Ph.D. degree from the Computer Science and Artificial Intelligence Laboratory (CSAIL) of Massachusetts Institute of Technology in 2017. He also served as the web chair of CVPR 2020. His research focuses on increasing accessibility of computational photography networks to millions of users: making them fast, robust, and less data-hungry.

Why simulated data is important and how to use them in image processing

Deep neural networks have demonstrated impressive performance in many tasks in computer vision, speech recognition, and natural language processing. One of the main reasons for its success is the usage of big training data with manual labels. However, for many image processing tasks, like denoising or reflection removal, it is often hard to get the ground truth output on a large dataset. Thus, researchers often face the dilemma: either use a small training set that consists of real-world input and ground truth output, or use a large simulated dataset that may be different from the real images.

In this talk, we will illustrate that, with a properly designed simulation pipeline, the networks trained on simulated data generalize well to real images. We will demonstrate the power of simulated data on 3 image processing tasks: dual-view reflection removal, raw image denoising, and flare removal. We will also discuss some design principles of how to build a better simulation pipeline.

Challenge description

This challenge is meant to consolidate and strengthen research efforts about image inpainting using structural guidance. We will prepare two tracks: image restoration (IR) and image editing (IE). In the IR track, we mask out random areas in an image and provide the edge maps within the areas to help restore the image. In the IE track, we carefully select some objects in an image to be removed and invite users to draw sketches to reflect their ideal layout in the missing region. In both tracks we challenge the researchers to inpaint the incomplete image with the structural guidance. The major difference between the two tracks is that, in the first track, we want to recover the missing areas so that the completed image approximates the original, and in the second track, our goal is to make the result visually plausible and to meet the structural constraints as much as possible.


Training data is now available at link. Training set consists of around 1500 high-definition natural, complete images.
Tips: To generate randomly masked training images, participants may visit this link to look for random mask samples produced by NVIDIA.

Validation set has two subsets that correspond to the two tracks: IR and IE. For validation set we will provide a mask for each image to indicate "missing areas" and a sketch map to provide the "structural guidance". The validation data can be downloaded at link.

Test set is similar in scale to the validation set, and for each test image we will provide incomplete image together with a mask and sketch map. The ground-truth of test images is not public.

Evaluation criteria

We set different criteria for the two tracks.

For the IR track, we will use a combination of PSNR and SSIM, calculated within and around the missing areas between the original and the completed images.

For the IE track, we will use a combination of objective and subjective evaluations.

For objective evaluation, we use FID and the structure similarity metric, SS. FID measures the realism of the generated images. Meanwhile, SS measures the similarity between the input user sketch and the edge map in the missing region extracted by XDoG. For subjective evaluation, we use MOS. We will have a team of about 15~20 subjects to evaluate the completed images, where each subject is given two images, generated by two proposals respectively, at once and required to select a better one from the two. The preference ratio of a method is defined as the percentage of times this method is selected in all of its related selections.

Submission deadline

  • June 19, 2021: Release of validation data
  • July 1, 2021: Release of final testing data
  • July 19, 2021: Test image submission deadline
  • Sept. 14, 2021: Test image submission deadline
  • Oct. 5, 2021: Evaluation results announcement
  • Oct. 6-8, 2021: IEEE MMSP 2021

Submission guidelines

By Sept. 14, 2021, participants should the tests result (i.e. completed images).

By Sept. 15, 2021, participants should to submit their (testing) code and model for verification purpose.

We will provide an uploading service for participants.


Prof. Dong Liu
University of Science and Technology of China
Email: dongeliu AT ustc dot edu dot cn
Homepage: Link
Prof. Zhangyang (Atlas) Wang
University of Texas at Austin
Email: atlaswang AT utexas dot edu
Homepage: Link
Dr. Shuai Yang
Nanyang Technological University
Email: shuai.yang AT ntu dot edu dot sg
Homepage: Link
Yueyu Hu
New York University
Email: yyhu AT nyu dot edu
Homepage: Link