A single engineer. Sixty minutes. A fully functional, production-ready SaaS tool. The numbers alone sound like a myth—but Treasure Data has made them real with Treasure Code, an AI-native command-line interface that lets users interact with its customer data platform (CDP) through natural language commands. The real breakthrough wasn’t the speed, though. It was the governance architecture that made it safe to trust AI with production systems at all.
Most AI coding tools today operate in controlled environments, where outputs are reviewed, tested, and manually approved before they ever reach users. Treasure Code flips that model: AI generates the code, but the platform’s built-in guardrails ensure it never violates security, compliance, or access policies. The result? A workflow where AI writes 100% of the codebase—but only the governance layer ensures it ships.
The lesson for engineering leaders is clear: speed without structure is chaos. Without the right controls in place, even the fastest AI-driven development can unravel into compliance gaps, unplanned adoption, and wasted effort. Here’s how Treasure Data built the system—and why the governance layer was just as critical as the code itself.
Where the Magic Happened
Treasure Code wasn’t built in isolation. It emerged from Treasure Data’s broader push to make its CDP accessible through natural language, but with one critical difference: the AI agent generating the code had to operate within the same security and permission boundaries as any human user. That meant no API keys could be exposed, no personally identifiable information (PII) could leak, and every command had to respect the user’s existing access levels—even if the user was an AI.
The foundation for this was laid long before the 60-minute coding sprint. Chief Product Officer Rafa Flores and the engineering leadership team spent weeks designing a governance framework that would enforce these rules at the platform level—not as a post-hoc check, but as an inherent part of the system. The result? A three-tiered validation pipeline that ensures AI-generated code meets production standards before it ever reaches a user.
The Three Layers of Trust
The first layer is an AI-powered code reviewer, itself built using Claude Code. Unlike traditional static analysis tools, this reviewer doesn’t just scan for syntax errors—it enforces architectural alignment, security compliance, and even documentation quality. If the code fails any check, it flags for human review. If it passes, the system can merge automatically, reducing manual bottlenecks without sacrificing safety.
The second layer is a standard CI/CD pipeline, running unit tests, integration checks, and security scans on every change. The third? Human oversight, required only when automated systems detect risk or when enterprise policies demand explicit approval. The internal mantra at Treasure Data is simple: AI writes the code, but AI does not ship it.
This structure isn’t just about catching bugs—it’s about ensuring the AI itself can’t bypass the rules. If a command attempts to access restricted data or expose sensitive information, the system rejects it before it ever reaches execution. The governance layer isn’t an add-on; it’s the operating system for AI-driven development.
Why It’s More Than Just a Smart CLI
At first glance, Treasure Code might look like any other AI-powered interface—point a tool like Cursor at a database, add natural language, and call it done. But the real innovation lies in how it inherits Treasure Data’s existing access controls and permission structures. Most AI query tools run with whatever privileges the API key provides, turning every command into a potential security risk. Treasure Code, by contrast, enforces the same restrictions as the platform itself: a user can’t query data they don’t already have access to, and PII remains protected regardless of how the command is phrased.
The second key difference is orchestration. While generic AI tools might execute a single query or analysis, Treasure Code connects to Treasure Data’s AI Agent Foundry, allowing it to coordinate complex workflows across segmentation, reporting, and activation—all in one command. It’s the difference between asking an AI to run an analysis and having it automatically trigger follow-up actions based on the results.
The Unplanned Lessons
Not everything went as planned. Treasure Data initially assumed the tool would remain internal while the team refined its approach. Instead, customers discovered it organically—more than 100 companies and nearly 1,000 users adopted it within two weeks, with no marketing push. The result? A compliance gap. The product wasn’t yet certified under Treasure Data’s Trust AI program, leaving the company scrambling to retroactively validate something already in use.
Another challenge emerged when non-engineering teams—like customer success managers and account directors—began submitting custom skills. Without clear guidelines on what would get approved, the backlog grew as submissions hit access policy roadblocks. The lesson? Governance isn’t just about preventing bad outcomes—it’s about defining what good outcomes look like before the system is exposed to broader use.
Where It Stands Today
Early adopters like Thomson Reuters have found value in Treasure Code’s speed and flexibility, particularly for tasks like audience segmentation that would otherwise require custom development. The feedback has centered on extensibility—how easily the tool can be adapted to new use cases—but also on a gap Treasure Data is still addressing: guidance on AI maturity. The product doesn’t yet tell users who should use it, what they should build first, or how to structure access across different skill levels. Flores sees this as the next frontier: AI that not only enables rapid development but also guides organizations on how to use it effectively.
For engineering leaders evaluating AI-driven tools, the Treasure Data experience offers three critical takeaways
- Governance must precede the code. Without platform-level controls, AI-generated outputs become a compliance liability. The faster the development, the more critical the guardrails.
- Automated quality gates are non-negotiable at scale. Human review is essential, but it should be the final check—not the primary mechanism. AI can enforce consistency and policy compliance across every pull request without fatigue.
- Organic adoption will happen faster than you think. If the tool works, users will find it before you’re ready. The compliance and operational gaps Treasure Data faced are a direct result of underestimating this reality.
The takeaway isn’t that AI can replace human engineers—it’s that humans must design the systems AI can trust. Speed is meaningless without structure. And in the world of production-grade SaaS, structure is everything.