StackLok Is Making Security 'Boring' for Open-Source AI-Assisted Devs

Stacklok’s new CodeGate software removes many of the pain points and security traps for those developing using AI tools.

Stacklok CEO Craig McLuckie at All Things Open 2024. — Stacklok CEO Craig McLuckie delivering a keynote address at All Things Open 2024. | Source: ATO

It looks like Craig McLuckie is bringing “boring” to security, especially for developers who use AI tools as part of their development process.

“Boring” was the term that just about everybody in the Kubernetes arena — especially McLuckie and his team at the early Kubernetes startup Heptio — picked up and ran with six or seven years ago to describe how easy to use and stable the container orchestration standard Kubernetes had become.

Sysadmins and DevOps teams like boring. It makes their jobs easier.

McLuckie isn’t pushing Kubernetes anymore. These days he’s teamed-up with Luke Hinds, who among other things is the founder of sigstore and a former Red Hatter. They’re co-founders of a security-focused startup, StackLok, that currently maintains two open-source security platforms, Minder and Trusty.

And it may not have occurred to them yet, but they’re still working to make their customers lives boring.

They’ve recently unveiled a new project, CodeGate, that developers can use to make sure that any AI helpers they’re using to help write code, such as Copilot or ChatGPT, don’t start doing things like adding secrets, malware, or vulnerabilities into the code they’re producing. AI tools can and will do that if left to their own devices, which makes cleaning up code after it’s been written a time consuming task that also isn’t guaranteed to catch everything that’s in need of being gotten out.

“We’ve basically built a set of controls that a developer deploys into the local machine that gives them a little bit more ability to make sure that their privacy is being protected and which enables them to make sure that security practices are being applied as they use these generative AI tools,” McLuckie told FOSS Force in a recent Zoom interview.

For the time being CodeGate is a totally free work-in-progress. Eventually, there will be a paid edition for enterprise customers, but McLuckie was adamant that doesn’t mean the free version will eventually become crippleware or that the project will move to a “source available” license that will place restrictions on who can use the software and for what purposes.

“We’re not looking to sell technologies to developers,” he said. “We want to support developers and work with developers. We aim to create a vibrant community around this technology and we aim to support self-sufficiency for folks that want to use this tech. Where we aim to make money over time is with enterprise organizations that want to not only meet a certain set of controls, but want to shape their work in a way that meets their specific standards. So it’ll be more about individual developers free to use forever, and then the sort of enterprise version that meets specific enterprise standards around governance or compliance.”

At this stage of the game, CodeGate protects developers on three basic levels. It keeps malware and potential vulnerabilities out of the code being produced, it helps keep the AI’s foundational model fresh, and it helps developers keep their secrets.

Buy 200W Mono Panels, Get Free MC4 Cables

Keeping AI from Blabbing Secrets

Before AI started being used for coding, it was relatively easy for developers to keep secrets out of their code. These days it’s not so easy. Managing generative AI that’s being used to create code can be like managing a four-year-old that might use the “F” word while company is present, just because she heard her daddy say it days earlier when he thought she was out of earshot. Any data an AI tool can access is liable to end up in a developer’s code without rhyme or reason, and AI tools often have access to more things on a computer than users suspect.

“A good example would be there are secrets in your project because you’re just hacking around and you haven’t yet set up a system to manage the deployment of those secrets into a production environment,” McLuckie said. “You really don’t want that content being egressed to the cloud. There might be personally identifiable information in some of the folders that you’re using — your tax returns might be in an adjacent folder — and the only thing that’s restricting a lot of these models from unfettered access to everything on your machine is the prompt context, and we know that’s not an infallible system control.”

A demo video with Hinds that we’re including at the bottom of this article shows CodeGate automatically catching some secrets and redacting them during a simple code review.

Turning Hallucinations Into Malware

One of the biggest problems that AI has these days is its habit of “hallucinating,” or perhaps more accurately, of making things up, which is obviously a problem when it happens in code. From a security standpoint, this is a big issue because AI often makes up libraries that don’t exist, and gives them names that are not necessarily unique, which opens doors for bad actors.

“If you look at the sort of popular open source models that a lot of people are using in the local context, about one in five package names will be hallucinated,” McLuckie said. “So if I’m a Python developer and I need a package that supports the ability to invoke an HTTP risk-based endpoint and recover results or whatever, often it will come up with a name like ‘invokehttp.’ That package doesn’t exist, it’s just imagining this as a probabilistic way of generating content. There’s a probability field that it’s exploring and it might render an actual package that exists or might just make something up that sounds plausible.”

This tendency would merely be irritating if names were hallucinated so randomly that they are rarely repeated, but unfortunately AI platforms often make up the same names all over the place, with “invokehttp” being a case in point.

“The problem is that a lot of nation-state actors know that these things hallucinate so they use them,” he said. “Every time it hallucinates a name they write it down, and then they go and create a package using that name. Invokehttp was a package that was a fork of the Selenium ChromeDriver that they renamed ‘invokehttp’ and then included a Base64 encoded remote code execution capability. The idea was that you deploy that and it’s game over. At this point they have remote command execution to your environments, they go grab the GitHub tokens, and its off to the races.”

Not only does CodeGate catch issues such as this, it’ll even attempt to recommend safe replacements. In addition, it’ll have your back if you include some open-source code in your project that you found on GitHub that turns out to be malware or to have known vulnerabilities.

Keeping Your AI Fresh

“The amount of resource it takes to build a foundational model is absolutely astonishing,” McLuckie said. “It’s hundreds of millions of dollars of work to produce one of these large foundational models, and once it’s been created it’s rotting on the vine. It’s going to be in sort of a state of slow decay, where what was true when it was trained is no longer true.”

CodeGate, he said, solves that issue.

KnownHost your premium managed hosting provider.

“What we’ve built is a way to do what’s known as retrieval-augmented generation,” he said. “We basically take this database that we’ve built that’s kind of up-to-the-moment current on all the open-source packages in the various package ecosystems, so it has a lot of context around the packages — what has malicious content, what has CVEs, what has been deprecated, what is unsustainable, what has poor burst factors — and we compress that into this purpose-built vector database that then ships in that package, so you’ve got this little local rag-in-a-box kind of experience.

“As the prompt comes it automatically gets marked up with some really fresh relevant context in a relatively lightweight way, and all of that content goes off to the generative AI model, and it updates it on the latest of what’s been happening in the community. It’s just a way of of bringing things up to to current specifications and adding a little bit of of augmentation to your experience.”

Using CodeGate

Again, CodeGate is open source, licensed under the Apache 2 license. It resides within a Docker container on the user’s machine, acting as a proxy between the user and whatever AI tool is being used. CodeGate does need to support the AI tool however, which shouldn’t be much of a problem for most developers since many of the more popular tools — Anthropic, ChatGPT, etc. — are already supported. For those that aren’t, instruction for DIY integration is provided online.

“It’s a very simple integration model,” he said. “A large part of what makes community projects so special is that we’ll show developers how, but then it should really be owned by the community. Communities can figure out what makes sense for them, like what IDE extensions or what environments they want to work with. We’re here to help the community get to a point of self-sufficiency and self-reliance so it’ll be relatively easy for a developer to establish an integration pattern and then publish it. Then we’ll work with the community to maintain those as they get delivered.”

Although a bit on the geeky side, installing CodeGate should be easy enough for coders. Prominently displayed on the project’s website’s homepage is the command that needs to be run once it’s downloaded.

“Basically it’s just one Docker run command and then you go to localhost port 8989,” McLuckie said. “On the top you’ll see a little dashboard and there’ll be a little help thing that’ll walk you through all of the other install instructions. It’s actually pretty powerful just the way it is right now, so I’d love to see people give it a go, kick the tires, and then join our Discord servers to tell us about it. If they don’t like it, we’re happy to change it and fix it.”

When you do open Discord to tell them about your CodeGate experience, be sure to mention how boring it’s making your work.

Christine Hall

Christine Hall has been a journalist since 1971. In 2001, she began writing a weekly consumer computer column and started covering Linux and FOSS in 2002 after making the switch to GNU/Linux. Follow her on Twitter: @BrideOfLinux

FOSSForce.com