ClarifyLM. Matsvei Liapich

The Problem

AI assistants require you to manually copy-paste context. If you’re looking at code, a document, or a UI, you have to describe it or transcribe it before asking a question. That friction breaks flow.

Cloud-based tools add another problem: anything you share leaves your machine. For sensitive code, proprietary documents, or client work, that’s a non-starter.

The Solution

ClarifyLM captures what’s on your screen and lets you ask questions about it directly. Hit a global shortcut, capture a region, and start a conversation. The AI sees what you see.

Everything runs locally using Apple’s Foundation Models framework. No data leaves your machine, which makes it practical for code reviews, sensitive documents, and anything you would not paste into a cloud service.

Four Capture Modes

Text (OCR), area selection, single window, and fullscreen. All accessible through global keyboard shortcuts (&Option;T, &Option;A, &Option;W, &Option;S).

On-Device AI

Powered by Apple Foundation Models. All inference runs locally on Apple Silicon with zero network requests.

Multi-Turn Chat

Threaded conversations with streaming responses, markdown rendering, syntax-highlighted code blocks, and suggested follow-up questions.

Persistent History

All conversations saved locally with SwiftData. Pick up where you left off across app launches.

Engineering

Capture-to-AI pipeline. The app window must hide before capture so it does not screenshot itself, run the system capture tool, process the result with OCR if needed, then restore the window and stage content for the chat. All of that has to feel smooth. Fullscreen capture uses ScreenCaptureKit. Area and window capture use macOS’s screencapture utility. Getting the timing right across these APIs and NSApplication window management took careful coordination.

Swappable AI backend. I designed around a ChatBackend protocol so the AI provider can change without touching the rest of the app. The current implementation uses AppleFoundationBackend, but adding a cloud provider would be a single conformance. That paid off early when testing different model configurations.

Streaming responses. The AI generates token by token, so I built the chat around AsyncThrowingStream for incremental display. The UI updates in real time as the model generates, with markdown rendering applied progressively.

Service architecture. A singleton ChatService coordinates the UI layer, SwiftData persistence, and the AI backend. ScreenshotService and OCRService handle capture and text extraction independently, which keeps each concern testable and composable.

Menu bar + full UI. The app lives in SwiftUI’s MenuBarExtra with a NavigationSplitView. Conversation list on the left, active chat on the right. Global shortcuts are registered through the HotKey package using Carbon events.

Technologies

Foundation Models

Apple’s on-device AI framework for private, local inference.

Vision

OCR and text recognition from screen captures.

ScreenCaptureKit

System-level screen and window capture.

SwiftData

Local persistence for conversations and messages.

HotKey

Global keyboard shortcuts via Carbon events.

Splash

Syntax highlighting for code blocks in AI responses.

DigitalOcean

Custom backend for user management and auth.

Paddle

Subscription payments and license validation.

Swift Markdown UI

Rich markdown rendering in chat responses.