Workshop on Multimodal Content Moderation (MMCM)

at 2023 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)

Date: June 18, 2023

Venue: Vancouver Convention Center, Vancouver, Canada


Welcome to the 1st IEEE Workshop on Multimodal Content Moderation (MMCM) being held in conjunction with CVPR 2023!

Content moderation (CM) is a rapidly growing need in today’s world, with a high societal impact, where automated CM systems can discover discrimination, violent acts, hate/toxicity, and much more, on a variety of signals (visual, text/OCR, speech, audio, language, generated content, etc.). Leaving or providing unsafe content on social platforms and devices can cause a variety of harmful consequences, including brand damage to institutions and public figures, erosion of trust in science and government, marginalization of minorities, geo-political conflicts, suicidal thoughts and more. Besides user-generated content, content generated by powerful AI models such as DALL-E and GPT present additional challenges to CM systems.

With the prevalence of multimedia social networking and online gaming, the problem of sensitive content detection and moderation is by nature multimodal. Moreover, content moderation is contextual and culturally multifaceted, for example, different cultures have different conventions about gestures. This requires CM approach to be not only multimodal, but also context aware and culturally sensitive.


Mevan Babakar

News and Information Credibility Lead, Google

Renée DiResta

Research Manager, Stanford University

Hany Farid

Professor, UC Berkeley

Tarleton Gillespie

Senior Principal Researcher, Microsoft Research

Dmitriy Karpman

CTO and Co-founder, Hive

Matt Lease

Professor, UT Austin

Mohammad Norouzi

Imagen Co-founder

Pietro Perona

Caltech, Amazon


This workshop intends to draw more visibility and interest to this challenging field, and establish a platform to foster in-depth idea exchange and collaboration. Authors are invited to submit original and innovative papers. We aim for broad scope, topics of interest include but are not limited to:

  • Multi-modal content moderation in image, video, audio/speech, text;
  • Context aware content moderation;
  • Datasets/benchmarks/metrics for content moderation;
  • Annotations for content moderation with ambiguous policies, perspectivism, noisy or disagreeing labels;
  • Content moderation for synthetic/generated data (image, video, audio, text); utilizing synthetic dataset;
  • Dealing with limited data for content moderation.
  • Continual & adversarial learning in content moderation services;
  • Explainability and interpretability of models;
  • Challenges of at-scale real-time content moderation needs vs. human-in-the-loop moderation;
  • Detecting misinformation;
  • Detecting/mitigating biases in content moderation;
  • Analyses of failures in content moderation.

Submission Link:

Authors are required to submit full papers by the paper submission deadline. These are hard deadlines due to the tight timeline; no extensions will be given. Please note that due to the tight timeline to have accepted papers included in the CVPR proceedings, no supplemental materials or rebuttal will be accepted.

Papers are limited to eight pages, including figures and tables, in the CVPR style. Additional pages containing only cited references are allowed. Papers with more than 8 pages (excluding references) or violating formatting specifications will be rejected without review. For more information on the submission instructions, templates, and policies (double blind review, dual submissions, plagiarism, etc.), please consult CVPR 2023 - Author Guidelines webpage. Please abide by CVPR policies regarding conflict, plagiarism, double blind review, dual submissions, and attendance.

Accepted papers will be included in the CVPR proceedings, on IEEE Xplore, and on CVF website. Authors will be required to transfer, to the IEEE, copyrights for any papers published in the conference proceedings. At least one author is expected to attend the workshop and present the paper.


Event Date
Paper Submission Deadline March 22, 2023, 11:59:59 Pacific Time
Final Decisions to Authors April 2, 2023, 11:59:59 Pacific Time
Camera Ready Deadline April 8, 2023, 11:59:59 Pacific Time

Authors are required to submit full papers by the paper submission deadline. These are hard deadlines due to the tight timeline; no extensions will be given. Please note that due to the tight timeline to have accepted papers included in the CVPR proceedings, no supplemental materials or rebuttal will be accepted.


Mei Chen

Principal Research Manager
Responsible & OpenAi Research, Microsoft

Cristian Canton

Research Manager
Responsible AI, Meta

Davide Modolo

Research Manager
AWS AI Labs, Amazon

Maarten Sap

Assistant Professor
LTI, Carnegie Mellon University

Maria Zontak

Sr. Applied Scientist
Alexa Sensitive Content Intelligence, Amazon

Chris Bregler

Director / Principal Scientist
Google AI


Session Start Time (PST)
Opening Remarks 8:30AM
Invited Talks (2) 8:45AM
Coffee Break 9:45AM
Invited Talks (2) 10:00AM
Panel Discussion with Invited Speakers 11:00AM
Lunch 12:00 noon
Invited talks (2) 1:00PM
Accepted Paper Presentations 2:00PM
Poster/Demo Session & Coffee Break 3:00PM
Invited talks (2) 3:30PM
Panel Discussion with Invited Speakers 4:30PM
Closing Remarks 5:30PM


Lama Ahmad leads the Researcher Access Program at OpenAI, which facilitates collaborative research on key areas related to the responsible deployment of AI and mitigating risks associated with such systems. Most recently, she co-led the external red teaming effort for the DALL-E 2 deployment. A member of the Deployment Planning team at OpenAI, Lama works on conducting analyses to prepare for safe and successful deployment of increasingly advanced AI.
Mevan is the News and Information Credibility Lead at Google, working to tackle misinformation globally, and to support journalists and publishers around the world. Previously she was deputy CEO of Full Fact, the UK’s independent fact checking charity where she worked on the problems mis/disinformation for seven years and founded Full Fact's automated fact checking team. Mevan was previously Interim CEO at Democracy Club which empowers voters and everyday democracy in the UK. Mevan also sat on the board of the International Fact Checking Network, which oversees 300 fact checking organizations worldwide.
Renée DiResta is the Research Manager at the Stanford Internet Observatory. She investigates the spread of malign narratives across social networks and assists policymakers in understanding and responding to the problem. She has advised Congress, the State Department, and other academic, civic, and business organizations, and has studied disinformation and computational propaganda in the context of pseudoscience conspiracies, terrorism, and state-sponsored information warfare.
Hany Farid is a professor at the University of California, Berkeley with a joint appointment in electrical engineering & computer sciences and the School of Information. He is also a member of the Berkeley Artificial Intelligence Lab, Berkeley Institute for Data Science, Center for Innovation in Vision and Optics, Development Engineering, Vision Science Program, and is a senior faculty advisor for the Center for Long-Term Cybersecurity. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception.
He received his undergraduate degree in computer science and applied mathematics from the University of Rochester in 1989, his M.S. in computer science from SUNY Albany, and his Ph.D. in computer science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in brain and cognitive sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019.
He is the recipient of an Alfred P. Sloan Fellowship and a John Simon Guggenheim Fellowship and is a fellow of the National Academy of Inventors.
Tarleton Gillespie is a Senior Principal Researcher at Microsoft Research New England, part of the Social Media Collective, Microsoft Research’s team of sociologists, anthropologists, and communication & media scholars studying the impact of sociotechnical systems on social and political life. Tarleton also retains an affiliated Associate Professor position with Cornell University, where he has been on the faculty for nearly two decades.
Tarleton’s current work investigates how social media platforms and other algorithmic information systems shape public discourse. His latest book, Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions that Shape Social Media (Yale University Press, 2018) examines how the content guidelines imposed by social media platforms set the terms for what counts as ‘appropriate’ user contributions, and ask how this private governance of cultural values has broader implications for freedom of expression and the character of public discourse. The book was a finalist for the 2019 PROSE Award from Association of American Publishers (AAP).
Dmitriy Karpman is currently the Co-Founder and CTO at Hive. Dmitriy has also previously worked as a software engineering intern at Google and as the CTO and Co-Founder of Kiwi. Dmitriy has a wealth of experience in research, having worked as a research specialist at the Center for Geospatial Intelligence and as a research assistant at Washington University in St. Louis. Dmitriy received their Doctor of Philosophy in Computer Science from Stanford University. Dmitriy also holds a Bachelor of Science from the University of Missouri-Columbia in Computer Science, Mathematics, and Statistics.
Matthew Lease received degrees in Computer Science from Brown University (PhD, MSc) and the University of Washington (BSc). His research on information retrieval and crowdsourcing was recognized by three Early Career awards: from the Defense Advanced Research Projects Agency (DARPA), the National Science Foundation (NSF), and the Institute for Museum and Library Sciences (IMLS). More recent honors include Best Student Paper at the 2019 European Conference for Information Retrieval (ECIR) and Best Paper at the 2016 Association for the Advancement of Artificial Intelligence (AAAI) Human Computation and Crowdsourcing conference (HCOMP). From 2010-2013, Lease ran benchmarking challenges for the National Institute of Standards and Technology (NIST) Text Retrieval Conference (TREC). Lease's industry experience includes stints at Intel Research, Computer game company HyperBole Studios, image compression startup LizardTech, crowdsourcing startup CrowdFlower, and Amazon.
Pamela Mishkin is interested in how to make language models safe and fair, from a technical and policy perspective. Previously she Led product management at The Whistle, a small start-up building tech tools for international human rights groups. Before that, she researched economic policy at the Federal Reserve Bank of New York and worked with the Department of Digital Culture, Media and Sport in the UK on online advertising policy. She holds a BA in Computer Science and Math from Williams College and an MPhil in Technology Policy from the University of Cambridge (Herchel-Smith Fellow).
Mohammad Norouzi is the co-founder of Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. He was a staff research scientist at Google Brain in Toronto. He is interested in developing simple and efficient machine learning algorithms that help solve challenging problems across a broad range of application domains including natural language processing and computer vision.
He joined the Google brain team in Mountain View in January 2016 and moved to Toronto in January 2018. He completed his PhD in computer science at the University of Toronto in December 2015. His advisor was David Fleet, and he was supported by a Google PhD fellowship in machine learning. His PhD thesis focused on scalable similarity search. He is from Iran, where he finished undergraduate studies at Sharif University of Technology.
Lilian is an Applied AI Research Manager at OpenAI. Lilian also has a ML tech blog as she believes the best way to learn is by explaining a new concept clearly to others.
Professor Perona is currently interested in visual recognition, more specifically visual categorization. He is studying how machines can learn to recognize frogs, cars, faces and trees with minimal human supervision, and how machines can learn from human experts. His project `Visipedia' has produced two smart device apps (iNaturalist and Merlin Bird ID) that anyone can use to recognize the species of plants and animals from a photograph.
In collaboration with Professors Anderson and Dickinson, professor Perona is building vision systems and statistical techniques for measuring actions and activities in fruit flies and mice. This enables geneticists and neuroethologists to investigate the relationship between genes, brains, and behavior. Professor Perona is also interested in studying how humans perform visual tasks, such as searching and recognizing image content. One of his recent projects studies how to harness the visual ability of thousands of people on the web.


  • Christopher Clarke, PhD student, University of Michigan
  • Gaurav Mittal, Senior Researcher, Microsoft
  • J.P. Lewis, Staff Research Scientist, Google Research
  • Jay Patravali, Data & Applied Scientist II, Microsoft
  • Jialin Yuan, PhD Student, Oregon State University
  • Jiarui Cai, Applied Scientist, AWS AI Labs
  • Lan Wang, PhD Student, Michigan State University
  • Mahmoud Khademi, Researcher 2, Microsoft
  • Mamshad Nayeem Rizve, PhD Student, University of Central Florida
  • Matthew Hall, Principal Applied Scientist, Microsoft
  • Reid Pryzant, Senior Researcher, Microsoft
  • Sandra Sajeev, Data Scientist 2, Microsoft
  • Sarah Laszlo, Staff Research Scientist, Google Research
  • Satarupa Guha, Applied Scientist II, Microsoft
  • Simon Baumgartner, Software Engineer, Google Research
  • Soumik Mandal, Applied Scientist, Amazon
  • Tobias Rohde, Applied Scientist II, Amazon
  • Xuhui Zhou, PhD student, Carnegie Mellon University
  • Ye Yu, Senior Software Engineer, Microsoft
  • Zhen Gao, Applied Scientist II, Amazon


If you have any questions, please feel free to reach us at below