Support Engineer

Operate the Grovli production stack — keep the AI pipeline, mobile API, and web frontend healthy for paying users.

remotesupportfull-timeoperationson-call

Support Engineer

We're hiring a support engineer to keep Grovli running reliably for the paying users who depend on it. This is a hybrid role: you'll own production health, drive incident response, and partner with the engineering team on user-reported bugs that slip past automated tests.

About Grovli

Grovli is an AI meal-planning product on iOS and the web. The stack runs on Google Cloud — 8 Cloud Run services, MongoDB, Redis, Firestore vector DB, Vertex AI for Gemini + Imagen, observability via Grafana / Loki / Prometheus / Tempo. We operate as a small team where the difference between "this works" and "users notice latency" gets caught by a person reading dashboards, not a Tier-3 escalation chain.

What you'll do

  • Own production observability: Grafana dashboards for backend latency, AI-pipeline cost, meal-generation success rates, sync fastpath hit rate. When a metric drifts, you find the root cause — usually via Tempo traces, Loki logs, or direct Mongo aggregations.
  • Triage user-reported issues: paying users hit edge cases the gauntlet doesn't cover. You reproduce, scope, and either patch (for config / data fixes) or hand a self-contained bug report to engineering with the failing trace.
  • Drive incident response: when the meal generator drifts, the matcher under-hits, or a wearable integration breaks, you're the person who takes ownership end-to-end — declare the incident, scope blast radius, coordinate the fix, write the postmortem.
  • Run health-checks for adjacent systems: Garmin / WHOOP / Withings OAuth health, Vertex AI quota burndown, MongoDB index pressure, Redis memory headroom. Catch problems before users do.
  • Improve the gauntlet: when an incident reveals a coverage gap, you write the integration test that would have caught it.

You probably have

  • 2+ years operating a real production service (Cloud Run, GKE, EC2, Heroku — any of these are fine, what matters is you've debugged problems live)
  • Comfort with logs, traces, metrics — you can navigate a Grafana dashboard or write a Loki LogQL query without a tutorial
  • A debugging instinct: you reach for git log, gcloud logging, and Mongo aggregations before you reach for guesses
  • Patience with users who are frustrated and clarity in writing back to them

Bonus points

  • Python or TypeScript reading-level fluency (you don't have to write features but you should be able to follow a stack trace into source)
  • Experience with OpenTelemetry, Grafana, Loki, or similar
  • A knack for writing clear incident postmortems

How to apply

Send a resume + a paragraph describing a production incident you led to info@citigrove.com with the subject "Support Engineer". Bonus: share a postmortem you wrote — public or scrubbed — that you're proud of.

Support Engineer — Careers at Grovli