USC Researchers Present Groundbreaking Research at EMNLP 2023
Los Angeles, California – December 6, 2023 – The University of Southern California (USC) proudly showcases the presentation of 23 research papers by its faculty and students at the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), held in Singapore from December 6 to 10. EMNLP stands as one of the world’s largest conferences dedicated to natural language processing (NLP), a subfield of artificial intelligence (AI) that delves into the intricate relationship between computers and human languages.
Research Spotlights
Context Counts When Moderating Content on Twitch
In the realm of live-streaming platforms like Twitch and YouTube Live, content moderation poses a formidable challenge due to the real-time nature of messages. Research Assistant Dong-Ho Lee and Principal Scientist Jay Pujara of USC Viterbi’s Information Sciences Institute (ISI) have devised an innovative method that dramatically enhances the performance of moderation models on live platforms by harnessing contextual information. Their research revealed that empowering human moderators with details such as chat history, video context, and external knowledge tailored to the comment can elevate model moderation performance by an impressive 35%.
Fact or Fiction?
Large language models (LLMs) like ChatGPT face hurdles in generating responses to questions or claims that lack veracity. Traditionally, researchers have relied on manually assembling negative examples for training models, a time-consuming process. In their paper titled “SCENE: Self-Labeled Counterfactuals for Extrapolating to Negative Examples,” USC computer science researchers propose an approach to automatically generate subtly negative examples using synthetic data. This research aims to inspire further exploration of synthetic data’s potential in developing more effective and versatile machine learning models.
AI Tools for Journalists: A Source-Recommendation Engine
The challenges faced by local news journalists, including excessive workload, underpayment, and tight deadlines, are well-known. To expedite the news writing process, USC computer science graduate student Alexander Spangher and colleagues are developing AI gadgets for journalists, including a source-recommendation service. This service analyzes a given topic, suggests relevant sources, and furnishes their contact information, enhancing journalists’ efficiency and productivity.
A Rose by Any Other Name
Assistant Professor of Computer Science Jieyu Zhao and her research team are investigating whether computer translations of names vary based on race, ethnicity, and gender. Their analysis revealed that translation systems struggle with accurately translating names associated with females, particularly those linked to racial (Black) and ethnic (Hispanic) minorities. This research underscores the need for more research in machine translation to ensure high-quality service for users, irrespective of their gender, race, and ethnicity.
Complete List of Accepted USC Papers
The following is a comprehensive list of the 23 USC papers accepted for presentation at EMNLP 2023:
- A Causal View of Entity Bias in (Large) Language Models
- Fei Wang, Wenjie Mo, Yiwei Wang, Wenxuan Zhou, Muhao Chen
- ALCAP: Alignment-Augmented Music Captioner
- Zihao He, Weituo Hao, Wei-Tsung Lu, Changyou Chen, Kristina Lerman, Xuchen Song
- Analyzing Norm Violations in Live-Stream Chat
- Jihyung Moon, Dong-Ho Lee, Hyundong Justin Cho, Woojeong Jin, Chan Young Park, Minwoo Kim, Jonathan May, Jay Pujara, Sungjoon Park
- Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems
- Yixin Wan, Jieyu Zhao, Aman Chadha, Nanyun Peng, Kai-Wei Chang
- A Rose by Any Other Name Would not Smell as Sweet: Social Bias in Names Mistranslation
- Sandra Sandoval, Jieyu Zhao, Marine Carpuat, Hal Daumé III
- Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering
- Wang Zhu, Jesse Thomason, Robin Jia
- BRAINTEASER: Lateral Thinking Puzzles for Large Language Models
- Yifan Jiang, Filip Ilievski, Kaixin Ma, Zhivar Sourati
- Challenges in Context-Aware Neural Machine Translation
- Linghao Jin, Jacqueline He, Jonathan May, Xuezhe Ma
- Continual Dialogue State Tracking via Example-Guided Question Answering
- Hyundong Justin Cho, Andrea Madotto, Zhaojiang Lin, Khyathi Chandu, Satwik Kottur, Jing Xu, Jonathan May, Chinnadhurai Sankar
- Dense Retrieval as Indirect Supervision for Large-space Decision Making
- Nan Xu, Fei Wang, Mingtao Dong, Muhao Chen
- Estimating Large Language Model Capabilities without Labeled Test Data
- Harvey Yiyun Fu, Qinyuan Ye, Albert Xu, Xiang Ren, Robin Jia
- Evaluating Large Language Models on Controlled Generation Tasks
- Jiao Sun, Yufei Tian, Wangchunshu Zhou, Nan Xu, Qian Hu, Rahul Gupta, John Frederick Wieting, Nanyun Peng, Xuezhe Ma
- Exploring Distributional Shifts in Large Language Models for Code Analysis
- Shushan Arakelyan, Rocktim Jyoti Das, Yi Mao, Xiang Ren
- How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench
- Qinyuan Ye, Harvey Yiyun Fu, Xiang Ren, Robin Jia
- Identifying Informational Sources in News Articles
- Alexander Spangher, Nanyun Peng, Emilio Ferrara, Jonathan May
- Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
- Ximing Lu, Faeze Brahman, Peter West, Jaehun Jang, Khyathi Chandu, Abhilasha Ravichander, Lianhui Qin, Prithviraj Ammanabrolu, Liwei Jiang, Sahana Ramnath, Nouha Dziri, Jillian Fisher, Bill Yuchen Lin, Skyler Hallinan, Xiang Ren, Sean Welleck, Yejin Choi
- Learn Your Tokens: Word-Pooled Tokenization for Language Modeling
- Avijit Thawani, Saurabh Ghanekar, Xiaoyuan Zhu, Jay Pujara
- Look-back Decoding for Open-Ended Text Generation
- Nan Xu, Chunting Zhou, Asli Celikyilmaz, Xuezhe Ma
- Making Large Language Models Better Data Creators
- Dong-Ho Lee, Jay Pujara, Mohit Sewak, Ryen W White, Sujay Kumar Jauhar
- Remember What You Did So You Know What To Do Next
- Manuel R. Ciosici, Alex Hedges, Yash Kankanampati, Justin Martin, Marjorie Freedman, Ralph Weischedel
- Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation
- Yuliang Cai, Jesse Thomason, and Mohammad Rostami
- Temporal Knowledge Graph Forecasting Without Knowledge Using In-Context Learning
- Dong-Ho Lee, Kian Ahrabian, Woojeong Jin, Fred Morstatter, Jay Pujara
- We’re Afraid Language Models Aren’t Modeling Ambiguity
- Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
Conclusion
USC’s robust presence at EMNLP 2023 underscores the university’s steadfast commitment to advancing research in natural language processing and its unwavering dedication to developing groundbreaking solutions to real-world problems. The accepted papers span a diverse range of topics, showcasing USC’s profound expertise and wide-ranging capabilities in the field. These research endeavors have the potential to significantly impact the trajectory of AI technologies and revolutionize human-computer interaction across various domains.