Serving Creative AI just got a whole lot easier with OctoML’s OctoAI

news7g06/15/2023

1 6 minutes read

Serving Creative AI just got a whole lot easier with OctoML's OctoAI

octoai-template-quickstart — The OctoAI compute service offers templates for using familiar generic AI programs like Stable Diffusion 2.1, but customers can also upload their own fully customized models.

OctoML

Enthusiastic interest in running artificial intelligence programs, especially variety programs, such as OpenAI’s ChatGPTis producing a cottage industry that can streamline the work of putting AI models into production.

Also: 92% of developers are using AI tools, according to GitHub developer survey

On Wednesday, the startup OctoML, which has improved AI performance on various computer systems and chips, service launch to smooth the work of making predictions, the “inference” part of the AI, is called the OctoAI Computing Service. Developers who want to run large and similar language models can upload their models to the service, and OctoAI takes care of the rest.

That could bring more parties to the service of AI, not just traditional AI engineers, the company says.

“The audience of our product is general app developers who want to add AI functionality to their apps and add massive language models to answer questions, etc., without having to actually put it in. those models into production, which is still very, very difficult.” co-founder and CEO of OctoML, Luis Ceze, said in an interview with ZDNET.

Ceze says creative AI is even harder to service than some other forms of AI.

“These models are large, they use a lot of computation, it requires a lot of infrastructure work that we think developers don’t need to worry about that,” he said.

Also: OctoML CEO: MLOps needs to give way to DevOps

OctoAI, Ceze said, is “basically a platform usually built by someone using a proprietary model” such as OpenAI for ChatGPT. “We’ve built an equivalent platform that allows you to run open source models.

“We want people to focus on building their apps, not building the infrastructure, so people get more and more creative.”

The service, which is currently available for general access, has been restricted to the first few customers for about four months, Ceze said, “and reception has exceeded everyone’s expectations with the margin,” said Ceze. very wide.”

Although OctoAI’s focus is on inference, the program can do a limited amount of what is known as “training”, the initial part of AI where neural networks are developed by providing parameters. number for the goal to optimize.

Also: How ChatGPT can rewrite and improve your existing code

“We can fine-tune, but we can’t train from scratch,” says Ceze. Refinement refers to further work that occurs after initial training but before inference, such as to tune a neural network program to the characteristics of a domain.

octoml-luis-ceze-2-may22-2023-version-2 — “We want people to focus on building their apps, not building the infrastructure, to enable people to be more and more creative,” said OctoML co-founder and CEO Luis Ceze said.

OctoML

That emphasis, Ceze said, was an “intentional choice” by OctoML, since inference is a more frequent task in AI.

“Our work is focused on inference because I have always been a strong believer that for any successful model you will do more inference operations than training and this foundation is really about getting models into production and production. And over the lifetime of the models, the majority of cycles will be inferred.”

OctoML’s team is well versed in the intricacies of bringing neural networks to production due to their central role in the development of the open source project. Apache TVM. Apache TVM is a software compiler that works differently from other compilers. Instead of turning a program into typical chip instructions for the CPU or GPU, it studies the “graph” of computational operations in a neural network and finds the best way to map those operations to hardware-based dependencies between operations.

Also: How to use ChatGPT to create apps

The OctoML technology that came out of Apache TVM, called Octomizer, is “like a component of MLOps or DevOps flows,” says Ceze.

The company has “taken everything we’ve learned” from Apache TVM and from building Octomizer, Ceze said, “and produced an entirely new product, OctoAI, which essentially includes all services. optimize this, and then deliver, fully and fine-tuned, a platform that allows users to put AI into production and even run it for them.”

Also: AI and DevOps, combined, can help unleash developer creativity

The service “automatically accelerates the model,” selects the hardware, and “constantly switches to the best way to operate your model, in a very, very turnkey way.”

octoai-command-cut-for-twitter-new — Using CURL at the compliment line is a simple way to get started with the mockup.

OctoML

The result for the client, says Ceze, is that the customer no longer needs to be explicitly an ML engineer; they can be a general software developer.

The service allows customers to bring their own models or rely on “a curated collection of high-value models,” including Stable Diffusion 2.1, Dolly 2, LLaMa, and Whisper for audio transcriptions. .

Also: Meet the post-AI developer: More creative, more business-focused

“As soon as you log in, as a new user you can play with the models right away, you can copy them and get started very easily.” The company can help refine such open-source programs to customer needs, “or you can come up with your own fully customized model.”

So far, the company has found a “very consistent mix” of open source models and their own custom models.

The infrastructure launch partner for OctoAI was Amazon’s AWS, which OctoML is reselling, but the company is also working with some customers who want to use the capacity they’ve purchased from the public cloud. add.

Nha-octomizer-2 — Uploading a completely custom model is another option.

OctoML

For example, on the sensitive issue of how to upload customer data to the cloud for fine-tuning, “We started with SOC2 compliance from the very beginning, so we make sure to be very strict about mixing customer data. used with other users’ data,” said Ceze, referring to the SOC2 voluntary compliance standard developed by the American CPA Institute. “We have all that, but now we have customers who are more demanding for private deployments where we actually create instances of the system and run it in their own VPCs.” [virtual private cloud.]”

“That won’t roll out immediately, but it’s a very, very short-term roadmap because customers are asking for it.”

Also: A low-code platform means anyone can be a developer — and possibly a data scientist too

OctoML is up against parties big and small, who also feel there is a huge opportunity to increase the benefits of prediction with AI programs since ChatGPT.

MosaicML, another young startup with deep technical expertise, in May Introduction Mosaic Inference, deploy large models. And the Google cloud announced last month G2, is said to be “purposefully built for inferential AI workloads as large as artificial intelligence.”

Ceze says there are several things that distinguish OctoAI from its competitors. “One of the unique things we offer is that we focus on fine-tuning and inference,” rather than “a product that only has inference as an ingredient.” OctoAI also differentiates itself, he said, by being “purposefully built for developers to build apps very quickly” rather than having to be AI engineers.

Also: AI design is about to change from open source Apache TVM and OctoML

“Building an inference focus is really hard,” commented Ceze, “you have very strict requirements” including “very strict uptime requirements.” When running a training session, “if you get an error, you can rollback, but by inference, a request you miss could upset the user.”

“I can imagine people training models with MosaicML and then feeding them into OctoML for service,” added Ceze.

OctoML has yet to announce pricing, he said. The service will initially allow potential customers to use it for free, from the registration page, with an initial allocation of free compute time, and “then pricing will be very transparent in terms of where it costs you to use compute, and this is the level of performance we gives you, so it becomes very, very clear, it’s not just computation time that you’re getting more out of the computation that you’re paying for.” That will ultimately be priced on a consumption basis, he said, whereby customers only pay for what they use.

Also: The 5 biggest risks of innovative AI, according to an expert

There is a clear possibility that a lot of AI model services could go from being a fancy, premium service today into a commodity.

“I really think that AI functionality has the potential to become commoditized,” says Ceze. “It will be very interesting for users, like spell check or grammar check, just imagine every text interface has generalized AI.”

At the same time, “the larger models can still do some things that the commodity models can’t,” he said.

Scale means that the complexity of inference will be a difficult problem to deal with for a while, he said. Ceze notes that large language models with 70 billion or more parameters generate hundreds of “giga-ops,” one hundred billion operations per second, for every word they generate.

Also: This new technology can blow away GPT-4 and everything like that

“This is a lot of large orders that exceed any workload that has ever run,” he added, and it took a lot of work to create a platform of global scale.

“And so I think it’s going to bring business to a lot of people, even if it’s commoditized, because we’re going to have to squeeze performance and compute out of everything we have in cloud, GPU and CPU and others.”

news7g06/15/2023

1 6 minutes read