back to home page
AI systems & model behavior researcher focused on evaluation,
failure modes, and decision-making in high-stakes environments
(elections, conflict, online harm).

Led global systems for synthetic media verification
Identify failure modes in high-stakes contexts
Developed evaluation frameworks for real-world AI performance
Design systems for decision-making under uncertainty
Translate model behavior into real-world decisions
Advise industry, policy, and civil society on safe AI deployment

Deepfake Rapid Response Force

Real-time decision-support system for verifying high-risk synthetic media across global information ecosystems (launched 2023).

Detection + expert workflows
Real-time, high-stakes decisions
Revealed failure under real-world pressure

Problem: AI detection tools are unreliable in real-world, time-sensitive contexts. They are not integrated into decision workflows and struggle in hybrid environments where authentic and synthetic content coexist.

What I did: Designed and deployed workflows integrating AI detection tools, expert analysis, and rapid response coordination across journalists and forensic experts in multiple global regions.

Outcome: Enabled real-time verification and response decisions across diverse geopolitical contexts, improving how high-risk media is assessed and acted upon under uncertainty.

Key insights:

A significant share of high-risk cases (~30%) involve authentic content, with AI used as an alibi—making verification more complex than simple detection
Real-world environments are hybrid: synthetic and authentic content coexist, requiring systems designed for uncertainty rather than binary classification
Audio presents a particularly high risk, with detection lagging behind generation; risks from video are rapidly increasing

TRIED Benchmark

Sociotechnical evaluation framework for assessing real-world effectiveness of AI detection systems (2025).

Problem: Traditional benchmarks measured accuracy but failed to capture real-world performance, impact, and usability.

What I did: Co-developed an evaluation framework incorporating real-world context, transparency, equity, and downstream outcomes.

Outcome: Shifted evaluation from model-centric metrics to system-level effectiveness, informing deployment and policy decisions.

Key insight: High model accuracy does not reliably translate to meaningful harm reduction or effective real-world use.

MNW Benchmark (Microsoft-Northwestern-WITNESS)

Benchmark dataset for evaluating AI-generated media detection across image, video, and audio in real-world conditions (2026).

Problem: Existing detection benchmarks lacked real-world diversity and failed to capture how models perform across modalities and evolving generation techniques.

What I did: Contributed to the development of a multi-modal benchmark dataset and evaluation framework in collaboration with Microsoft AI for Good Lab and academic partners.

Outcome: Enabled more robust, cross-modal evaluation of AI detection systems, supporting research and development of more generalizable models.

Key insight: Detection systems struggle to generalize across modalities, real-world content, and rapidly evolving generation methods-evaluation must prioritize breadth and adaptability over static benchmarks.

NCII AI Detection & Moderation Systems

Evaluation and analysis of AI detection and moderation systems for non-consensual AI intimate imagery (2026).

Problem: Detection systems appeared effective in isolation but often failed to reduce harm in real-world moderation workflows, particularly with NCII content.

What I did: Designed evaluation approaches incorporating consent, context, and survivor-centered outcomes into system performance.

Outcome: Identified critical gaps between detection capabilities and actual harm mitigation, with the goal of informing improvements in system design and policy.

Key insight: NCII exposes the limits of detection the most: harm cannot be reliably inferred from content alone without context and consent signals.

Adversarial Media & Redaction Systems

Evaluation of anonymization techniques against AI-driven reconstruction in high-risk media contexts (2026).

Problem: Common redaction methods (blurring, masking) were assumed to protect identity but were increasingly vulnerable to AI reconstruction.

What I did: Co-led cross-organizational research testing anonymization techniques against modern AI reconstruction models.

Outcome: Revealed systemic vulnerabilities in widely used safeguards and informed new guidance for privacy and evidentiary integrity.

Key insight: Techniques designed for earlier threat models no longer hold - privacy protection must anticipate adversarial AI capabilities.

AI & Generative Media Working Group

Global, cross-sector initiative exploring the real-world implications of generative AI in media and storytelling.

Problem: Rapid advances in generative models outpace understanding of their implications for authorship, trust, and representation across the media ecosystem.

What I did: Co-led a working group of 300+ members (including students, documentarians, producers, and funders), facilitating cross-sector dialogue on emerging AI capabilities with creative, societal, and governance considerations.

Outcome: Created a shared space for sensemaking, knowledge exchange, and coordination, informing how practitioners and institutions approach generative AI media.

Key insight: Collective understanding and action are critical - AI systems are not just passively adopted, but actively shaped, contested, and reimagined by communities that use them.

Using Generative AI for Human Rights Advocacy

Exploration and framework for the responsible use of generative AI tools in human rights advocacy and storytelling (2023).

Problem: Generative AI introduces new possibilities for creating and modifying audiovisual content, but also raises significant ethical risks around consent, authenticity, and potential harm in human rights contexts.

What I did: Researched and analyzed emerging AI generative tools (image, video, audio), conducted experimentation and global consultations to identify practical use cases and risks for human rights organizations.

Outcome: Developed early guidance outlining where generative AI can support advocacy and where caution is needed, including key questions and considerations for responsible use.

Key insight: Generative AI can both strengthen and undermine human rights- responsible use depends on context, consent, and transparency.

Click Here for My Creative Portfolio – ML, VR & AR in real-world storytelling

back to main page