AI rewriting - local or cloud - your call

AI that works your way,
not the cloud's way.

Rewrite, translate, shorten, formalise - with AI that either runs entirely on your machine (zero network, zero cost) or routes through Makro's own infrastructure (prompt and response content not retained; only per-tier usage metadata is kept 90 days for credit enforcement). Two honest routes. Your text chooses one.

Route 01 - Local←

Ollama / LM Studio

Runs on your CPU or GPU. Zero credits used, zero network requests. Free on every plan.

0 network0 credits1-3s typical

Route 02 - Cloud↑

Makro Smart Rewrite

Runs on our Cloudflare Workers AI. Llama 3.3 70B. Content not retained; usage metadata kept 90 days.

1 credit / rewrite~800msBest quality

Route 03 - Off×

Sensitive Mode

For healthcare, legal, financial work. Cloud AI blocked at the processing layer.

Local onlyClipboard auto-clearToolbar indicator

Routing

You control where your text goes.

Most AI writing tools give you one route: theirs. Makro gives you three - pick per request, or set a global default. Your hardware, your bandwidth, your privacy posture.

Your draft

"thx for reaching out, ill get back asap"

rewrite - pick a tone

Route 01 - Local

Local AI

Ollama or LM Studio on your machine. Text processed on your CPU/GPU. No network request ever leaves.

NetworkZero

Credits0

Latency1-3s

CostFree

Route 02 - Cloud

Makro Smart Rewrite

TLS request to Makro's proxy, then to Cloudflare Workers AI. Content held in request memory and discarded; only usage metadata persisted.

TransportTLS 1.3

Credits1 / rewrite

Latency~800ms

ModelLlama 3.3 70B

Route 03 - Blocked

Sensitive Mode

Software processing-layer guard. Cloud AI request never formed. Only Route 01 (Local) is available. Two toggles: block cloud AI, and auto-clear clipboard.

CloudBlocked

ClipboardAuto-clear

IndicatorToolbar

OverrideManual only

12 rewrite actions

Every action, tuned for its job.

Each rewrite type runs at an optimised model temperature - grammar gets low-temperature determinism, casual tones get higher creativity. Click any action to see the before/after.

Smart Rewrite - live sample 12 actions + custom instruction slot

InputBefore

I wanted to reach out and let you know that we have received your application and our team is currently in the process of reviewing all of the materials you submitted.

OutputAfter - 1 credit

We received your application and are reviewing your materials.

Action: Shorten Cloud: prompt content not retained - usage metadata kept 90 days

Setup

Installing local AI, about 10 minutes.

Pick one - they both work. Ollama is CLI-first, LM Studio is GUI-first. Both are free, both are open-source, and Makro auto-detects whichever one is running.

Ollama

CLI-first, scriptable, runs as a background service. ollama.com

Download Ollama for your platform.

macOS, Linux, or Windows. Installer is ~200 MB. No account required.

Pull a model.

Run ollama pull llama3.1 in your terminal. Downloads ~4.7 GB. Takes 3-8 minutes depending on connection.

Allow browser extension access.

Set OLLAMA_ORIGINS=chrome-extension://* as an environment variable, then restart Ollama. This is Ollama's CORS setting.

Open Makro. Auto-detected.

Makro probes localhost:11434 and finds Ollama automatically. Smart Rewrite now routes to your machine.

~10 min total setup ~4.7 GB one-time download Free forever

~/zsh

# install ollama from ollama.com $ ollama pull llama3.1 pulling manifest... pulling 4.7 GB model... + success # allow browser extension calls $ export OLLAMA_ORIGINS="chrome-extension://*" $ ollama serve Listening on localhost:11434 # open Makro - auto-detected + Ollama connected

LM Studio

GUI-first, built-in model catalog, no terminal needed. lmstudio.ai

Download LM Studio.

macOS, Linux (AppImage), or Windows. GUI app, no CLI required.

Search and download a model.

LM Studio has a built-in catalogue. Search for Llama 3.1 or Mistral 7B. Click Download. ~4-5 GB.

Start the local server.

Click "Local Server" tab → Start Server. Default port localhost:1234. Enable CORS if prompted.

Makro finds it automatically.

No environment variables, no CLI commands. Works once the server is running.

~8 min total setup GUI only No terminal needed

LM Studio UI

# 1. install LM Studio from lmstudio.ai + installed # 2. search "llama 3.1" in catalogue downloading llama-3.1-8b.gguf + 4.7 GB downloaded # 3. Local Server tab - Start Serving on localhost:1234 + ready # 4. open Makro - auto-detected + LM Studio connected

Models

Pick the one that fits your machine.

Three categories. Smaller models are faster and run on modest hardware; larger models rewrite better but need more RAM. You can install multiple and switch per action.

Fast and cheap3-4 GB RAM

For laptops, older machines, quick grammar fixes. Response in under 2 seconds.

Phi-33.8B- 2.3 GB

Microsoft's small model. Runs on 4 GB RAM. Ideal for grammar fixes and shortening.

Mistral7B- 4.1 GB

Fast and lightweight. Solid general-purpose choice for quick rewrites.

BalancedRecommended

For most people, most of the time. Strong quality, reasonable hardware, 1-3 second response.

Llama 3.18B- 4.7 GB

Meta's flagship. Excellent instruction following for rewrites, translations, tone shifts.

Gemma 29B- 5.4 GB

Google's compact model. Strong at formal/casual tone shifts and structured outputs.

Qwen 2.57B- 4.4 GB

Alibaba's multilingual model. Best for translations and non-English rewrites.

Quality40 GB - heavy

For workstations, Mac Studios, heavy writers. Closest to cloud quality - 3-8 second response.

Llama 3.370B- 40 GB

Top-tier open-source quality. Same model Makro Smart Rewrite uses. Needs 48+ GB unified or system RAM.

Switching models: In Ollama, ollama pull <model-name> downloads any of these. In LM Studio, search the built-in catalogue. Makro picks up every model installed and lets you set a default or choose per action. Quantisation (q4, q8) is auto-selected; sizes above are approximate for q4.

Hardware

What fits your machine.

If you know your RAM and GPU, this table tells you which model to pick and how fast it will respond. Numbers are approximate - they will vary with quantisation and thermal headroom.

Your machine

RAM

Best model

Response

Apple Silicon - M1/M2/M3

MacBook Air, Pro, Mini

16-32GB

Llama 3.1 8BMistral 7B alternative

1-2s

NVIDIA GPU

RTX 3060+ (6+ GB VRAM)

16+GB

Llama 3.1 8BCUDA-accelerated

<1s

High-RAM desktop

32+ GB, no dedicated GPU

32+GB

Llama 3.1 8BSlower without GPU

3-5s

Mac Studio / Pro workstation

64+ GB unified memory

48+GB

Llama 3.3 70BCloud-tier quality

3-8s

Older laptop

8 GB system RAM

8GB

Mistral 7BOr Phi-3 for speed

3-6s

Low-end / Chromebook

4 GB RAM

4GB

Phi-3 3.8BOnly realistic option

4-8s

Questions

The AI FAQ.

Ollama and LM Studio are free, open-source tools that run large language models on your own computer. They handle the model download, memory management, and serve an HTTP API for other apps to hit.

Makro connects to that HTTP API on localhost:11434 (Ollama) or localhost:1234 (LM Studio). Your text goes Makro → localhost → back to Makro. Nothing leaves your machine.

Depends on which route you pick:

Route 01 (Local AI): Text never leaves your device. Zero network requests. Free on every plan.

Route 02 (Smart Rewrite): Text is sent over TLS 1.3 to Makro's proxy, which forwards to Cloudflare Workers AI. Prompt and response content are not retained; only usage metadata (endpoint, token count, timestamp) is kept 90 days to enforce per-tier credit limits. No external model vendor receives the content.

Route 03 (Sensitive Mode): Cloud requests are blocked before they are formed. Only local AI is available.

Any model served by Ollama or LM Studio. Both tools support a huge catalogue - Llama, Mistral, Gemma, Phi, Qwen, DeepSeek, CodeLlama, and hundreds of others. If Ollama can run it, Makro can use it.

Makro's own Smart Rewrite service runs Llama 3.3 70B on Cloudflare Workers AI.

No - but more RAM means better results. Phi-3 (3.8B) runs on 4 GB RAM machines, including old Chromebooks. Mistral 7B and Llama 3.1 8B are comfortable on 8 GB.

Apple Silicon Macs (M1/M2/M3) and NVIDIA GPUs (with CUDA) dramatically speed things up. On an M2 MacBook Air with Llama 3.1 8B, rewrites come back in 1-2 seconds. See the hardware matrix above.

Every time your text hits a third-party model vendor, it becomes their data under their terms. We do not trust a supply chain where we have to promise that their privacy policy stays acceptable over time.

Instead, we run open-source models on Cloudflare Workers AI. We control the request pipeline end-to-end and we promise zero content retention because we wrote the endpoints. No OpenAI, no Anthropic, no Google in the path.

Credits reset at the start of each month. Unused credits do not carry over - 25/mo on Free, 500/mo on Pro, 10,000/mo on Premium.

If you consistently run out on one plan but have credits left over on the next, you are probably on the wrong plan. Local AI via Ollama uses zero credits, so heavy users often keep a local model running for routine work and save cloud credits for higher-stakes rewrites.

Not yet. We are tracking interest on Pro and Premium plans. The tradeoff is privacy: the moment your text goes through OpenAI or Anthropic's API, it is under their retention policy, not ours. Local AI via Ollama gives you the same "my own infrastructure" benefit without that risk.

If BYOK is important to you, reply to the onboarding email and tell us which provider - we track requests.

AI that works your way.
Install in a minute.

Add to your browser - free→ All features

AI that works your way,not the cloud's way.