Workshop on Multimodal Content Moderation (MMCM)

at 2024 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)

Date: June 17, 2024

Venue: Seattle Convention Center, Seattle WA, USA


Welcome to the 2nd IEEE Workshop on Multimodal Content Moderation (MMCM) being held in conjunction with CVPR 2024!

Content moderation (CM) is a rapidly growing need in today’s world, with a high societal impact, where automated CM systems can discover discrimination, violent acts, hate/toxicity, and much more, on a variety of signals (visual, text/OCR, speech, audio, language, generated content, etc.). Leaving or providing unsafe content on social platforms and devices can cause a variety of harmful consequences, including brand damage to institutions and public figures, erosion of trust in science and government, marginalization of minorities, geo-political conflicts, suicidal thoughts and more. Besides user-generated content, content generated by powerful AI models such as DALL-E and GPT present additional challenges to CM systems.

With the prevalence of multimedia social networking and online gaming, the problem of sensitive content detection and moderation is by nature multimodal. Moreover, content moderation is contextual and culturally multifaceted, for example, different cultures have different conventions about gestures. This requires CM approach to be not only multimodal, but also context aware and culturally sensitive.


📢 Website is live!

📢 CMT submission site is open! Check out

📢 New Final Decision Date: April 10, 2023, 11:59:59 Pacific Time (Check Important Dates)


Sarah Gilbert

Research Director, Citizens and Technology Lab
Cornell University

Julia Hirschberg

Columbia University & Amazon Scholar

Adina Williams

Research Scientist


This workshop intends to draw more visibility and interest to this challenging field, and establish a platform to foster in-depth idea exchange and collaboration. Authors are invited to submit original and innovative papers. We aim for broad scope, topics of interest include but are not limited to:

  • Multi-modal content moderation in image, video, audio/speech, text;
  • Context aware content moderation;
  • Datasets/benchmarks/metrics for content moderation;
  • Annotations for content moderation with ambiguous policies, perspectivism, noisy or disagreeing labels;
  • Content moderation for synthetic/generated data (image, video, audio, text); utilizing synthetic dataset;
  • Dealing with limited data for content moderation.
  • Continual & adversarial learning in content moderation services;
  • Explainability and interpretability of models;
  • Challenges of at-scale real-time content moderation needs vs. human-in-the-loop moderation;
  • Detecting misinformation;
  • Detecting/mitigating biases in content moderation;
  • Analyses of failures in content moderation.

Submission Link:

Authors are required to submit full papers by the paper submission deadline. These are hard deadlines due to the tight timeline; no extensions will be given. Please note that due to the tight timeline to have accepted papers included in the CVPR proceedings, no supplemental materials or rebuttal will be accepted.

Papers are limited to eight pages, including figures and tables, in the CVPR style. Additional pages containing only cited references are allowed. Papers with more than 8 pages (excluding references) or violating formatting specifications will be rejected without review. For more information on the submission instructions, templates, and policies (double blind review, dual submissions, plagiarism, etc.), please consult CVPR 2024 - Author Guidelines webpage. Please abide by CVPR policies regarding conflict, plagiarism, double blind review, dual submissions, and attendance.

Accepted papers will be included in the CVPR proceedings, on IEEE Xplore, and on CVF website. Authors will be required to transfer, to the IEEE, copyrights for any papers published in the conference proceedings. At least one author is expected to attend the workshop and present the paper.


Event Date
Paper Submission Deadline March 22, 2024, 11:59:59 PM Pacific Time
Final Decisions to Authors April 10, 2024    April 8, 2024
Camera Ready Deadline April 14, 2024, 11:59:59 PM Pacific Time

Authors are required to submit full papers by the paper submission deadline. These are hard deadlines due to the tight timeline; no extensions will be given. Please note that due to the tight timeline to have accepted papers included in the CVPR proceedings, no supplemental materials or rebuttal will be accepted.


Mei Chen

Principal Research Manager
Responsible & OpenAi Research, Microsoft

Cristian Canton

Research Manager
Responsible AI, Meta

Davide Modolo

Research Manager
AWS AI Labs, Amazon

Maarten Sap

Assistant Professor
LTI, Carnegie Mellon University

Maria Zontak

Sr. Applied Scientist
Alexa Sensitive Content Intelligence, Amazon

Matt Lease

UT Austin


Time (PST) Event Title Speaker(s)
08:30 - 08:45 Opening Remarks and Logistics for the Day Mei Chen, Microsoft
08:45 - 09:45 Invited Talks
9:45 - 10:00 Coffee Break
10:00 - 11:00 Invited Talks
11:00 - 12:00 Panel Discussion
12:00 - 13:00 Lunch Break
13:00 - 14:00 Invited Talks
14:00 - 15:00 Accepted Paper Presentations
15:00 - 15:30 Coffee Break
15:30 - 16:40 Invited Talks
16:30 - 17:30 Panel Discussion
17:30 - 17:35 Closing Remarks Mei Chen, Microsoft


Sarah uses a sociotechnical lens to explore peoples’ experiences using online platforms. Her work has covered key areas such as motivations, informal learning, privacy and ethics, and online governance. To this end she uses a variety of methods, from qualitative ethnography and in-depth interviews, to quantitative surveys and social network analysis. Currently, she is the Research Director at the Citizens and Technology (CAT) Lab, founded by Dr. J. Nathan Matias, and leading the NSF-funded project, Learning Code(s): Community-Centered Design of Automated Content Moderation with along with PIs, Drs. Katie Shilton, Hal Daumé III, and Michelle Mazurek. Prior to that, she graduated from the University of British Columbia in 2018 advised by Drs. Caroline Haythornthwaite and Luanne Freund and worked as a postdoctoral scholar at the University of Maryland on the NSF funded PERVADE (Pervasive Data Ethics for Computational Research) project with Drs. Katie Shilton and Jessica Vitak.
Professor Hirschberg has analyzed the prosody of charismatic and deceptive speech and developed a computer system that is more successful than humans at detecting lies. She studies entrainment— the tendency of people to mirror back the spoken mannerisms of those who are speaking to them—in voice-response systems. Hirschberg also does research on emotional speech, code-switching, and text-to-speech synthesis. Hirschberg received a BA from Eckerd College in 1968 and a PhD in computer science from the University of Pennsylvania in 1985. She also received a PhD in history from the University of Michigan in 1976. She holds six patents for text-to-speech synthesis and audio browsing/retrieval. She is a fellow of the American Association for Artificial Intelligence, the American Association for Artificial Intelligence, the International Speech Communication Association, the Association for Computational Linguistics, the Association for Computing Machinery, and the Institute of Electrical and Electronics Engineers and is a founding fellow of the Association for Computational Linguistics. She was elected to the National Academy of Engineering in 2017 and to the American Academy of Arts and Sciences in 2018.
Adina is a Research Scientist at Facebook AI Research in NYC (started October 2018). Previously, she earned her PhD at New York University in the Department of Linguistics, where she investigated the brain basis of syntactic and semantic processing. Her main research goal is to strengthen the connections between linguistics and cognitive science on the one hand and natural language processing and artificial intelligence on the other. She approaches this process from both directions: she brings linguistic and cognitive scientific insights about human language to bear on training, evaluating, and debiasing NLP systems, and applies statistical methods and corpus analytic tools from NLP to uncover new quantitative, cross-linguistic facts about particular human languages.


  • Christopher Clarke, PhD student, University of Michigan
  • Gaurav Mittal, Senior Researcher, Microsoft
  • J.P. Lewis, Staff Research Scientist, Google Research
  • Jay Patravali, Data & Applied Scientist II, Microsoft
  • Jialin Yuan, PhD Student, Oregon State University
  • Jiarui Cai, Applied Scientist, AWS AI Labs
  • Lan Wang, PhD Student, Michigan State University
  • Mahmoud Khademi, Researcher 2, Microsoft
  • Mamshad Nayeem Rizve, PhD Student, University of Central Florida
  • Matthew Hall, Principal Applied Scientist, Microsoft
  • Reid Pryzant, Senior Researcher, Microsoft
  • Rishi Madhok, Senior Applied Science Manager, Microsoft
  • Sandra Sajeev, Data Scientist 2, Microsoft
  • Sarah Laszlo, Staff Research Scientist, Google Research
  • Satarupa Guha, Applied Scientist II, Microsoft
  • Simon Baumgartner, Software Engineer, Google Research
  • Soumik Mandal, Applied Scientist, Amazon
  • Tobias Rohde, Applied Scientist II, Amazon
  • Xuhui Zhou, PhD student, Carnegie Mellon University
  • Ye Yu, Senior Software Engineer, Microsoft
  • Zhen Gao, Applied Scientist II, Amazon


If you have any questions, please feel free to reach us at below

Previous Years

Check out the proceedings from previous years!