AI rewriting - local or cloud - your call

AI that works your way,
not the cloud's way.

Rewrite, translate, shorten, formalise - with AI that either runs entirely on your machine (zero network, zero cost) or routes through Makro's own infrastructure (prompt and response content not retained; only per-tier usage metadata is kept 90 days for credit enforcement). Two honest routes. Your text chooses one.

Route 01 - Local
Ollama / LM Studio
Runs on your CPU or GPU. Zero credits used, zero network requests. Free on every plan.
0 network0 credits1-3s typical
Route 02 - Cloud
Makro Smart Rewrite
Runs on our Cloudflare Workers AI. Llama 3.3 70B. Content not retained; usage metadata kept 90 days.
1 credit / rewrite~800msBest quality
Route 03 - Off×
Sensitive Mode
For healthcare, legal, financial work. Cloud AI blocked at the processing layer.
Local onlyClipboard auto-clearToolbar indicator
Routing

You control where your text goes.

Most AI writing tools give you one route: theirs. Makro gives you three - pick per request, or set a global default. Your hardware, your bandwidth, your privacy posture.

Your draft
"thx for reaching out, ill get back asap"
rewrite - pick a tone
Route 01 - Local
Local AI
Ollama or LM Studio on your machine. Text processed on your CPU/GPU. No network request ever leaves.
NetworkZero
Credits0
Latency1-3s
CostFree
Route 02 - Cloud
Makro Smart Rewrite
TLS request to Makro's proxy, then to Cloudflare Workers AI. Content held in request memory and discarded; only usage metadata persisted.
TransportTLS 1.3
Credits1 / rewrite
Latency~800ms
ModelLlama 3.3 70B
Route 03 - Blocked
Sensitive Mode
Software processing-layer guard. Cloud AI request never formed. Only Route 01 (Local) is available. Two toggles: block cloud AI, and auto-clear clipboard.
CloudBlocked
ClipboardAuto-clear
IndicatorToolbar
OverrideManual only
Setup

Installing local AI, about 10 minutes.

Pick one - they both work. Ollama is CLI-first, LM Studio is GUI-first. Both are free, both are open-source, and Makro auto-detects whichever one is running.

Ollama
CLI-first, scriptable, runs as a background service. ollama.com
1
Download Ollama for your platform.
macOS, Linux, or Windows. Installer is ~200 MB. No account required.
2
Pull a model.
Run ollama pull llama3.1 in your terminal. Downloads ~4.7 GB. Takes 3-8 minutes depending on connection.
3
Allow browser extension access.
Set OLLAMA_ORIGINS=chrome-extension://* as an environment variable, then restart Ollama. This is Ollama's CORS setting.
4
Open Makro. Auto-detected.
Makro probes localhost:11434 and finds Ollama automatically. Smart Rewrite now routes to your machine.
~10 min total setup ~4.7 GB one-time download Free forever
~/zsh
# install ollama from ollama.com $ ollama pull llama3.1 pulling manifest... pulling 4.7 GB model... + success   # allow browser extension calls $ export OLLAMA_ORIGINS="chrome-extension://*" $ ollama serve Listening on localhost:11434   # open Makro - auto-detected + Ollama connected
LM Studio
GUI-first, built-in model catalog, no terminal needed. lmstudio.ai
1
Download LM Studio.
macOS, Linux (AppImage), or Windows. GUI app, no CLI required.
2
Search and download a model.
LM Studio has a built-in catalogue. Search for Llama 3.1 or Mistral 7B. Click Download. ~4-5 GB.
3
Start the local server.
Click "Local Server" tab → Start Server. Default port localhost:1234. Enable CORS if prompted.
4
Makro finds it automatically.
No environment variables, no CLI commands. Works once the server is running.
~8 min total setup GUI only No terminal needed
LM Studio UI
# 1. install LM Studio from lmstudio.ai + installed   # 2. search "llama 3.1" in catalogue downloading llama-3.1-8b.gguf + 4.7 GB downloaded   # 3. Local Server tab - Start Serving on localhost:1234 + ready   # 4. open Makro - auto-detected + LM Studio connected
Models

Pick the one that fits your machine.

Three categories. Smaller models are faster and run on modest hardware; larger models rewrite better but need more RAM. You can install multiple and switch per action.

Fast and cheap3-4 GB RAM

For laptops, older machines, quick grammar fixes. Response in under 2 seconds.

Phi-33.8B- 2.3 GB
Microsoft's small model. Runs on 4 GB RAM. Ideal for grammar fixes and shortening.
Mistral7B- 4.1 GB
Fast and lightweight. Solid general-purpose choice for quick rewrites.
BalancedRecommended

For most people, most of the time. Strong quality, reasonable hardware, 1-3 second response.

Llama 3.18B- 4.7 GB
Meta's flagship. Excellent instruction following for rewrites, translations, tone shifts.
Gemma 29B- 5.4 GB
Google's compact model. Strong at formal/casual tone shifts and structured outputs.
Qwen 2.57B- 4.4 GB
Alibaba's multilingual model. Best for translations and non-English rewrites.
Quality40 GB - heavy

For workstations, Mac Studios, heavy writers. Closest to cloud quality - 3-8 second response.

Llama 3.370B- 40 GB
Top-tier open-source quality. Same model Makro Smart Rewrite uses. Needs 48+ GB unified or system RAM.
Switching models: In Ollama, ollama pull <model-name> downloads any of these. In LM Studio, search the built-in catalogue. Makro picks up every model installed and lets you set a default or choose per action. Quantisation (q4, q8) is auto-selected; sizes above are approximate for q4.
Hardware

What fits your machine.

If you know your RAM and GPU, this table tells you which model to pick and how fast it will respond. Numbers are approximate - they will vary with quantisation and thermal headroom.

Your machine
RAM
Best model
Response
Apple Silicon - M1/M2/M3
MacBook Air, Pro, Mini
16-32GB
Llama 3.1 8BMistral 7B alternative
1-2s
NVIDIA GPU
RTX 3060+ (6+ GB VRAM)
16+GB
Llama 3.1 8BCUDA-accelerated
<1s
High-RAM desktop
32+ GB, no dedicated GPU
32+GB
Llama 3.1 8BSlower without GPU
3-5s
Mac Studio / Pro workstation
64+ GB unified memory
48+GB
Llama 3.3 70BCloud-tier quality
3-8s
Older laptop
8 GB system RAM
8GB
Mistral 7BOr Phi-3 for speed
3-6s
Low-end / Chromebook
4 GB RAM
4GB
Phi-3 3.8BOnly realistic option
4-8s
Questions

The AI FAQ.

Ollama and LM Studio are free, open-source tools that run large language models on your own computer. They handle the model download, memory management, and serve an HTTP API for other apps to hit.

Makro connects to that HTTP API on localhost:11434 (Ollama) or localhost:1234 (LM Studio). Your text goes Makro → localhost → back to Makro. Nothing leaves your machine.

Depends on which route you pick:

Route 01 (Local AI): Text never leaves your device. Zero network requests. Free on every plan.

Route 02 (Smart Rewrite): Text is sent over TLS 1.3 to Makro's proxy, which forwards to Cloudflare Workers AI. Prompt and response content are not retained; only usage metadata (endpoint, token count, timestamp) is kept 90 days to enforce per-tier credit limits. No external model vendor receives the content.

Route 03 (Sensitive Mode): Cloud requests are blocked before they are formed. Only local AI is available.

Any model served by Ollama or LM Studio. Both tools support a huge catalogue - Llama, Mistral, Gemma, Phi, Qwen, DeepSeek, CodeLlama, and hundreds of others. If Ollama can run it, Makro can use it.

Makro's own Smart Rewrite service runs Llama 3.3 70B on Cloudflare Workers AI.

No - but more RAM means better results. Phi-3 (3.8B) runs on 4 GB RAM machines, including old Chromebooks. Mistral 7B and Llama 3.1 8B are comfortable on 8 GB.

Apple Silicon Macs (M1/M2/M3) and NVIDIA GPUs (with CUDA) dramatically speed things up. On an M2 MacBook Air with Llama 3.1 8B, rewrites come back in 1-2 seconds. See the hardware matrix above.

Every time your text hits a third-party model vendor, it becomes their data under their terms. We do not trust a supply chain where we have to promise that their privacy policy stays acceptable over time.

Instead, we run open-source models on Cloudflare Workers AI. We control the request pipeline end-to-end and we promise zero content retention because we wrote the endpoints. No OpenAI, no Anthropic, no Google in the path.

Credits reset at the start of each month. Unused credits do not carry over - 25/mo on Free, 500/mo on Pro, 10,000/mo on Premium.

If you consistently run out on one plan but have credits left over on the next, you are probably on the wrong plan. Local AI via Ollama uses zero credits, so heavy users often keep a local model running for routine work and save cloud credits for higher-stakes rewrites.

Not yet. We are tracking interest on Pro and Premium plans. The tradeoff is privacy: the moment your text goes through OpenAI or Anthropic's API, it is under their retention policy, not ours. Local AI via Ollama gives you the same "my own infrastructure" benefit without that risk.

If BYOK is important to you, reply to the onboarding email and tell us which provider - we track requests.

AI that works your way.
Install in a minute.