Coherence.

an AI presentation coach that analyzes your voice, gestures, and expressions to deliver precise, real-time feedback on how you truly present.

Overview

Detail

Coherence is an AI presentation coach that is designed to assist users with refining their presentation abilities. By integrating Deepgram AI, Coherence is able to analyze users’ speech patterns. By integrating TwelveLabs AI, Coherence is able to analyze users’ physical gestures and visuals. To complete the Coherence artificial intelligence pipeline, both the audio and video analyzed information gets sent to Google Gemini AI which compiles all these factors into a detailed results page. 

Problem Space

Public speaking remains one of the most common phobias, affecting 75% of people, with presentation anxiety particularly acute among college students who must deliver high-stakes academic presentations. While existing AI coaching tools like Yoodli and PowerPoint Coach can analyze speech patterns and count filler words, they completely miss the 55% of communication that is non-verbal, leaving a critical gap in feedback. Both students and professionals practice presentations repeatedly but have no objective way to detect when their body language contradicts their verbal message, creating trust violations that undermine even well-rehearsed pitches. This lack of comprehensive, multimodal feedback means that 90% of presentation anxiety stems from uncertainty about performance, as speakers never truly know if their delivery is coherent or if subtle visual-verbal misalignments are sabotaging their credibility.

Research

We conducted secondary research analyzing published studies on communication anxiety, competitive analysis of existing AI coaching tools (Yoodli, Orai, PowerPoint Coach), and examined industry reports on nonverbal communication’s role in presentation effectiveness. Our research revealed that 75% of people experience glossophobia (fear of public speaking), making it the most common phobia, while communication studies consistently show that 55% of message interpretation comes from nonverbal cues rather than spoken words. Competitive analysis exposed a critical limitation: all existing AI presentation coaches focus exclusively on audio metrics (filler words, pacing, tone) while completely ignoring the visual dimension that accounts for over half of audience perception. With 90% of presentation anxiety stemming from lack of objective feedback and 20 million college students required to deliver academic presentations annually, the data pointed to a clear market need for multimodal analysis that could detect when visual signals contradict verbal messages.

Problem Definition

College students preparing for high-stakes academic presentations and professionals seeking to improve their career prospects both lack access to comprehensive, objective feedback on their delivery, and this is critical as presentation skills increasingly appear in job requirements across industries. Existing AI coaching tools only analyze speech patterns while ignoring the 55% of communication that is nonverbal, leaving users unable to detect critical visual-verbal dissonance—moments when their body language contradicts their spoken message and breaks audience trust. These contradictions (saying “passionate” with a flat expression, or “look here” without pointing) create subconscious credibility gaps that undermine even well-rehearsed presentations, directly impacting students’ academic performance and professionals’ ability to land jobs or advance their careers in a market where presentation competency is now a baseline requirement. A successful solution would provide real-time, multimodal analysis that synchronizes what users say, how they look, and what they show, delivering specific, actionable coaching that raises their coherence score and builds presentation confidence through measurable improvement, ultimately making presentation mastery accessible to anyone, regardless of prior experience.

Technical Architecture