During a keynote address at All Things Open AI, IBM announced that three key AI tools it has developed are being donated to the Linux Foundation.

Who says nothing newsworthy ever comes out of a new conference’s inaugural run? During Tuesday morning’s keynotes at the first ever All Things Open AI conference in Durham, North Carolina, IBM announced that Big Blue is contributing three open source AI projects to the Linux Foundation.
After the announcements we caught up with Sriram Raghavan, IBM’s VP of IBM Research AI who made the announcement, and asked whether the Linux Foundation would be creating a new baby foundation specifically for these three projects or if they’d be dropped into an already existing program.
“We’re under discussions with them,” he answered. “We think that the right home for all three is the Linux Foundation’s AI and Data Foundation, but that’s a discussion with them and we will work with them, but that’s likely where they’ll end up.”
All three projects — Data Prep Kit, Docling, and BeeAI — are developer tools, “each focused on an essential area of the AI development stack” according to IBM. All three have already been being developed in the open, and so are available for download on GitHub.
Data Prep Kit
According to Raghavan, Data Prep Kit is designed to provide a framework for processing unstructured data.
“Unstructured data is the lifeblood of a lot of AI use cases,” he explained, “so being able to take unstructured data, process it, clean it, duplicate it, chunk it, and if there’s bad content in there, filter it, becomes a very important step. Developers spend a lot of time getting data ready before they can do the fun part of AI.
“What Data Prep Kit does is give you a framework and a set of pre-canned operations to do that very effectively, and it saves you from worrying about all of the underlying infrastructure plumbing.”
The tool was initially developed, he said, to scratch an itch that IBM was having with its own AI projects.
“When we started training our Granite models, which we also released in the open, we needed to process lots of data, and so this framework was very useful for us.”
He added that by turning Data Prep Kit over to the Linux Foundation, the company hopes to gain from improvements that outside developers bring to the project.
“There are many document types today that we haven’t encountered. There’s many, many other languages which we want to support. There’s probably lots of interesting filters and transformation. So we’re very excited about the community contributing and extending data.”
Docling
The second AI app that IBM’s donating is basically a way of working with different document formats, using AI as an aid.
“Docling actually fits very nicely with the first one, but we decided to keep it separate because document extraction and processing is its own complicated task,” Raghavan said. “Interestingly, it requires a lot of AI to process because a lot of high value knowledge, especially in enterprises, sits in PDF documents, in Excel, and in PowerPoint files.
“These are not always easy. You may have two columns text or you may have a reading order. You may have a chart in the middle, you may have a figure in the middle, or you have a table in the middle. So one of the big challenges is, how do you take a document that was visually created for human consumption and turn it into a well formatted document in HTML or markdown that an LLM can make sense of?”
The latter point is important, if for no other reason than it’s one of the things that separates Docling from traditional software apps that convert one document format into another without the aid of AI. It’s also a big reason why it’s a big hit with users on GitHub, where it already has more than 24,000 stars and 1,400 forks.
BeeAI
The BeeAI project focuses on AI agents, which are AI tools that can automate complex tasks that would typically require human intervention.
“Bee is actually three things put together,” Raghavan said. “One is a framework for building agents, that lets you create agents rapidly.”
The second is that it’s a platform for orchestrating and discovering multiple agents, which can either be IBM frameworks or frameworks from other sources.
“There are a lot of agent frameworks out there,” he explained. “The reason we’re doing that is it’s too early now to presume that we know exactly the right APIs and interfaces for building agents. There’s a lot of innovation, so there are many frameworks for building agents.”
The final part of BeeAI is a protocol for agent to agent communication.
“You can think of Bee as these three things,” he said, “agent development framework, a platform to discover and orchestrate, and a protocol to do that discovery and orchestration, and all of that we now want to develop in the open source community.”
Christine Hall has been a journalist since 1971. In 2001, she began writing a weekly consumer computer column and started covering Linux and FOSS in 2002 after making the switch to GNU/Linux. Follow her on Twitter: @BrideOfLinux
Be First to Comment