Anthropic has a brand new approach to shield giant language fashions towards jailbreaks

Most giant language fashions are educated to refuse questions their designers don’t need them to reply.…