macOS App
ClarifyLM
A macOS menu bar app that captures your screen and lets you ask questions about it. Entirely on-device, with no cloud dependency.
The Problem
AI assistants require you to manually copy-paste context. If you’re looking at code, a document, or a UI, you have to describe it or transcribe it before asking a question. That friction breaks flow.
Cloud-based tools add another problem: anything you share leaves your machine. For sensitive code, proprietary documents, or client work, that’s a non-starter.
The Solution
ClarifyLM captures what’s on your screen and lets you ask questions about it directly. Hit a global shortcut, capture a region, and start a conversation. The AI sees what you see.
Everything runs locally using Apple’s Foundation Models framework. No data leaves your machine, which makes it practical for code reviews, sensitive documents, and anything you would not paste into a cloud service.
Four Capture Modes
Text (OCR), area selection, single window, and fullscreen. All accessible through global keyboard shortcuts (&Option;T, &Option;A, &Option;W, &Option;S).
On-Device AI
Powered by Apple Foundation Models. All inference runs locally on Apple Silicon with zero network requests.
Multi-Turn Chat
Threaded conversations with streaming responses, markdown rendering, syntax-highlighted code blocks, and suggested follow-up questions.
Persistent History
All conversations saved locally with SwiftData. Pick up where you left off across app launches.
Engineering
Capture-to-AI pipeline. The app window must hide before capture so it does not screenshot itself, run the system capture tool, process the result with OCR if needed, then restore the window and stage content for the chat. All of that has to feel smooth. Fullscreen capture uses ScreenCaptureKit. Area and window capture use macOS’s screencapture utility. Getting the timing right across these APIs and NSApplication window management took careful coordination.
Swappable AI backend. I designed around a ChatBackend protocol so the AI provider can change without touching the rest of the app. The current implementation uses AppleFoundationBackend, but adding a cloud provider would be a single conformance. That paid off early when testing different model configurations.
Streaming responses. The AI generates token by token, so I built the chat around AsyncThrowingStream for incremental display. The UI updates in real time as the model generates, with markdown rendering applied progressively.
Service architecture. A singleton ChatService coordinates the UI layer, SwiftData persistence, and the AI backend. ScreenshotService and OCRService handle capture and text extraction independently, which keeps each concern testable and composable.
Menu bar + full UI. The app lives in SwiftUI’s MenuBarExtra with a NavigationSplitView. Conversation list on the left, active chat on the right. Global shortcuts are registered through the HotKey package using Carbon events.
Technologies
Foundation Models
Apple’s on-device AI framework for private, local inference.
Vision
OCR and text recognition from screen captures.
ScreenCaptureKit
System-level screen and window capture.
SwiftData
Local persistence for conversations and messages.
HotKey
Global keyboard shortcuts via Carbon events.
Splash
Syntax highlighting for code blocks in AI responses.
DigitalOcean
Custom backend for user management and auth.
Paddle
Subscription payments and license validation.
Swift Markdown UI
Rich markdown rendering in chat responses.