SMOG Documentation

Vision & Listen

SMOG's intelligent Vision and Listen features help you interact with content through sight and sound.

Smart Features

Screen content recognition
Audio transcription & analysis
Visual context understanding
Real-time audio processing

Vision Feature

What Vision Can Do

  • Read screen text from any application
  • Identify UI elements like buttons and menus
  • Extract information from documents and websites
  • Understand context of what you're working on
  • Generate descriptions of visual content
  • Help with accessibility for visually impaired users
  • Provide smart suggestions based on screen content
  • Enhanced Quick Reply with visual context

Using Vision Feature

Activate Vision Mode

  • • Use the shortcut⌘ + Shift + V
  • • Click "Enable Vision" in Quick Reply interface
  • • Say "SMOG, look at my screen" during voice commands

Vision in Action

Example: Email Reply

SMOG reads the email content and suggests contextual replies based on the email's content and tone.

Example: Document Summary

Ask SMOG to summarize a document and it will read the visible content to provide an intelligent summary.

Listen Feature

What Listen Can Do

  • Transcribe meetings in real-time
  • Capture audio notes hands-free
  • Listen to system audio (with permission)
  • Identify speakers in conversations
  • Generate meeting summaries automatically
  • Extract action items from discussions
  • Translate speech in real-time
  • Voice commands for hands-free control

Using Listen Feature

Start Listening

  • • Use the shortcut⌘ + Shift + L
  • • Click the microphone icon in the menu bar
  • • Enable "Continuous Listening" in preferences

Listen Modes

Meeting Mode

Optimized for multiple speakers, generates structured summaries with action items.

Note Taking

Perfect for lectures, interviews, or personal voice notes with smart formatting.

Combining Vision & Listen

+

Multimodal Intelligence

When both Vision and Listen are active, SMOG provides the most comprehensive understanding of your context.

Smart Replies

Analyzes both what you're looking at and what you're hearing to craft perfect responses.

Context Awareness

Understands the full situation by combining visual and audio information.

Enhanced Accuracy

Cross-validates information from multiple sources for better results.

Privacy & Settings

Privacy Controls
  • All processing happens locally - your data never leaves your Mac
  • Selective permissions - choose which apps SMOG can see/hear
  • Temporary mode - enable Vision/Listen only when needed
  • Data retention - control how long transcripts and vision data are stored
Configure Vision & Listen Settings

Access Settings:

  1. 1. Open SMOG Preferences (⌘ + ,)
  2. 2. Go to "Vision & Listen" tab
  3. 3. Configure permissions, quality, and privacy settings

Key Settings:

  • • Vision quality (Fast, Balanced, High Quality)
  • • Audio processing (Real-time, Background, On-demand)
  • • Privacy zones (exclude sensitive apps/areas)
  • • Automatic features (smart suggestions, context hints)
Troubleshooting Vision & Listen
Vision not working:
  • • Check Screen Recording permissions in System Settings
  • • Verify SMOG has Accessibility permissions
  • • Try disabling and re-enabling Vision in SMOG preferences
Listen issues:
  • • Confirm microphone permissions are granted
  • • Check input device selection in SMOG settings
  • • Test microphone in other apps to rule out hardware issues

Ready to Explore?

Vision & Listen make SMOG incredibly powerful. Start with one feature and gradually enable both for the full experience.