AI that works your way,
not the cloud's way.
Rewrite, translate, shorten, formalise - with AI that either runs entirely on your machine (zero network, zero cost) or routes through Makro's own infrastructure (prompt and response content not retained; only per-tier usage metadata is kept 90 days for credit enforcement). Two honest routes. Your text chooses one.
You control where your text goes.
Most AI writing tools give you one route: theirs. Makro gives you three - pick per request, or set a global default. Your hardware, your bandwidth, your privacy posture.
Every action, tuned for its job.
Each rewrite type runs at an optimised model temperature - grammar gets low-temperature determinism, casual tones get higher creativity. Click any action to see the before/after.
Installing local AI, about 10 minutes.
Pick one - they both work. Ollama is CLI-first, LM Studio is GUI-first. Both are free, both are open-source, and Makro auto-detects whichever one is running.
ollama pull llama3.1 in your terminal. Downloads ~4.7 GB. Takes 3-8 minutes depending on connection.OLLAMA_ORIGINS=chrome-extension://* as an environment variable, then restart Ollama. This is Ollama's CORS setting.localhost:11434 and finds Ollama automatically. Smart Rewrite now routes to your machine.localhost:1234. Enable CORS if prompted.Pick the one that fits your machine.
Three categories. Smaller models are faster and run on modest hardware; larger models rewrite better but need more RAM. You can install multiple and switch per action.
For laptops, older machines, quick grammar fixes. Response in under 2 seconds.
For most people, most of the time. Strong quality, reasonable hardware, 1-3 second response.
For workstations, Mac Studios, heavy writers. Closest to cloud quality - 3-8 second response.
ollama pull <model-name> downloads any of these. In LM Studio, search the built-in catalogue. Makro picks up every model installed and lets you set a default or choose per action. Quantisation (q4, q8) is auto-selected; sizes above are approximate for q4.
What fits your machine.
If you know your RAM and GPU, this table tells you which model to pick and how fast it will respond. Numbers are approximate - they will vary with quantisation and thermal headroom.
The AI FAQ.
Ollama and LM Studio are free, open-source tools that run large language models on your own computer. They handle the model download, memory management, and serve an HTTP API for other apps to hit.
Makro connects to that HTTP API on localhost:11434 (Ollama) or localhost:1234 (LM Studio). Your text goes Makro → localhost → back to Makro. Nothing leaves your machine.
Depends on which route you pick:
Route 01 (Local AI): Text never leaves your device. Zero network requests. Free on every plan.
Route 02 (Smart Rewrite): Text is sent over TLS 1.3 to Makro's proxy, which forwards to Cloudflare Workers AI. Prompt and response content are not retained; only usage metadata (endpoint, token count, timestamp) is kept 90 days to enforce per-tier credit limits. No external model vendor receives the content.
Route 03 (Sensitive Mode): Cloud requests are blocked before they are formed. Only local AI is available.
Any model served by Ollama or LM Studio. Both tools support a huge catalogue - Llama, Mistral, Gemma, Phi, Qwen, DeepSeek, CodeLlama, and hundreds of others. If Ollama can run it, Makro can use it.
Makro's own Smart Rewrite service runs Llama 3.3 70B on Cloudflare Workers AI.
No - but more RAM means better results. Phi-3 (3.8B) runs on 4 GB RAM machines, including old Chromebooks. Mistral 7B and Llama 3.1 8B are comfortable on 8 GB.
Apple Silicon Macs (M1/M2/M3) and NVIDIA GPUs (with CUDA) dramatically speed things up. On an M2 MacBook Air with Llama 3.1 8B, rewrites come back in 1-2 seconds. See the hardware matrix above.
Every time your text hits a third-party model vendor, it becomes their data under their terms. We do not trust a supply chain where we have to promise that their privacy policy stays acceptable over time.
Instead, we run open-source models on Cloudflare Workers AI. We control the request pipeline end-to-end and we promise zero content retention because we wrote the endpoints. No OpenAI, no Anthropic, no Google in the path.
Credits reset at the start of each month. Unused credits do not carry over - 25/mo on Free, 500/mo on Pro, 10,000/mo on Premium.
If you consistently run out on one plan but have credits left over on the next, you are probably on the wrong plan. Local AI via Ollama uses zero credits, so heavy users often keep a local model running for routine work and save cloud credits for higher-stakes rewrites.
Not yet. We are tracking interest on Pro and Premium plans. The tradeoff is privacy: the moment your text goes through OpenAI or Anthropic's API, it is under their retention policy, not ours. Local AI via Ollama gives you the same "my own infrastructure" benefit without that risk.
If BYOK is important to you, reply to the onboarding email and tell us which provider - we track requests.