About Bobsled

Bobsled is building AI-powered analytics experiences that turn natural language into accurate, production-grade insights. Our mission is to enable enterprise customers to leverage the full power of AI and data agents, transforming how they access and act on their data. As we scale our AI product, were seeking hands-on specialists to ensure our customers deployments are robust, contextually tuned, and delivering measurable value.

What Youll Do

Own the text-to-SQL accuracy problem end-to-end: design evals, iterate prompts, and improve retrieval/routing
Build and operate the experimentation and evaluation loop (automatic evals, regression suites, dataset curation)
Design pragmatic LLM application architectures (RAG, agent routing, tool-use orchestration) optimized for accuracy and latency
Ship production-grade code and support deployments; instrument, monitor, and troubleshoot model behavior in real customer environments
Partner closely with engineering and customers to improve semantic models, SQL generation, and data alignment
Create feedback loops from users to systematically capture issues and convert them into measurable improvements
Contribute to automation of environment provisioning and dev workflows to enable fast iteration

What Were Looking For

2+ years in ML/AI or data-focused engineering or data science roles building production systems data or AI systems
Demonstrated experience tuning LLM applications: prompt engineering, evals, retrieval, agent design, or similar
Strong hands-on coding in Python or TypeScript (TypeScript familiarity a plus; willingness to work across the stack required)
ML engineering mindset beyond notebooks: testing, CI, observability, performance, and deployment in production
Comfort with SQL and complex data modeling; familiarity with data warehouses and pipelines
Pragmatic, product-oriented approachoptimize for impact over novelty; complement existing systems rather than rebuild from scratch
Ability to design experiments, quantify improvements, and communicate trade-offs clearly

Nice to Have

Experience with text-to-SQL systems, semantic layers, or BI/analytics workflows
Exposure to RAG frameworks, knowledge graphs, vector stores, and evaluation tooling
Prior work in analytics engineering or data engineering environments

Success Looks Like

Measurable improvements in text-to-SQL accuracy across target datasets and partners
Reliable eval pipeline and regression suite running in CI to catch degradations
Clear architecture and documentation for context/agent systems that others can contribute to
Short feedback cycles with partners leading to fast, meaningful product wins

Compensation

Competitive salary and meaningful equity
Comprehensive benefits
Remote

AI Engineer