A radical plan to make AI good, not evil

news7g05/09/2023

36 2 minutes read

A radical plan to make AI good, not evil

it’s easy to Worry about more advanced artificial intelligence—and much more difficult to know what to do with it. mankinda startup founded in 2021 by a group of researchers who have left openAIsay it has a plan.

Anthropic is working on AI models similar to the one used to power OpenAI’s ChatGPT. But the startup announced today that its own chatbot, Claudethere’s a set of ethical principles built into it that defines what it should consider right and wrong, which Anthropic calls the bot’s “constitution.”

Jared Kaplan, Anthropic’s co-founder, says the design feature shows how the company is trying to find practical engineering solutions to address sometimes-fading concerns about the downside of more powerful AI. “We are very nervous, but we also try to remain pragmatic,” he said.

Anthropic’s approach doesn’t infuse the AI with tough rules that it can’t break. But Kaplan says it’s a more efficient way to make a system like a chatbot less likely to produce malicious or unwanted output. He also says that this is a small but significant step towards building smarter AI programs that are less resistant to their creators.

The concept of rogue AI systems is best known from science fiction, but more and more experts, including Geoffrey Hintonpioneer in the field of machine learning, argued that we need to start thinking now about how to ensure increasingly intelligent algorithms don’t become increasingly dangerous.

The principles that Anthropic gave Claude included guidelines drawn from the United Nations Universal Declaration of the Rights of Man and recommended by other AI companies, including Google DeepMind. More surprisingly, the constitution includes principles adapted from Apple’s Rules for app developersbar “content that is offensive, insensitive, objectionable, intended to be disgusting, has a particularly poor taste, or is simply creepy,” among others.

The constitution includes rules for chatbots, including “choose responses that most support and encourage freedom, equality, and brotherhood”; “choose the answer that is most supportive and encouraging to personal life, freedom and security”; and “choose an answer that most respects freedom of thought, conscience, opinion, expression, assembly, and religion.”

Anthropic’s approach is like Amazing progress in AI offers impressive fluent chatbots with significant flaws. ChatGPT and similar systems generate impressive responses that reflect faster-than-expected progress. But these chatbots also often fabricated informationand maybe copy malicious language from the billions of words used to create them, many of which are taken from the internet.

One trick that helps OpenAI’s ChatGPT answer questions better, and has been adopted by others, involves humans scoring the language model’s answer quality. That data can be used to tweak the model to give you more satisfying answers, in a process known as “human feedback reinforcement learning” (RLHF). But while this technique makes ChatGPT and other systems more predictable, it requires humans to go through thousands of malicious or inappropriate reactions. It also works indirectly without providing a way to specify the exact values the system should reflect.