Artwork

Content provided by Sigurd Schacht, Carsten Lanquillon, Sigurd Schacht, and Carsten Lanquillon. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Sigurd Schacht, Carsten Lanquillon, Sigurd Schacht, and Carsten Lanquillon or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.
Player FM - Aplicație Podcast
Treceți offline cu aplicația Player FM !

Episode 175 - Miniserie Interpretierbarkeit - Golden Gate Claude

29:45
 
Distribuie
 

Manage episode 426909075 series 2911119
Content provided by Sigurd Schacht, Carsten Lanquillon, Sigurd Schacht, and Carsten Lanquillon. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Sigurd Schacht, Carsten Lanquillon, Sigurd Schacht, and Carsten Lanquillon or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.

Send us a text

In dieser faszinierenden Episode erkunden Sigurd Schacht und Carsten Lanquillon, wie Anthropic's Forschung zur Interpretierbarkeit von KI es ermöglicht, Sprachmodelle auf Konzeptebene zu manipulieren. Sie diskutieren das aufsehenerregende Golden Gate Claude-Experiment, bei dem ein Sprachmodell dazu gebracht wurde, in jeder Konversation die Golden Gate Bridge zu erwähnen, und erörtern die weitreichenden Implikationen dieser Technologie für die Zukunft der KI-Steuerung und -Sicherheit.

Support the show

  continue reading

208 episoade

Artwork
iconDistribuie
 
Manage episode 426909075 series 2911119
Content provided by Sigurd Schacht, Carsten Lanquillon, Sigurd Schacht, and Carsten Lanquillon. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Sigurd Schacht, Carsten Lanquillon, Sigurd Schacht, and Carsten Lanquillon or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.

Send us a text

In dieser faszinierenden Episode erkunden Sigurd Schacht und Carsten Lanquillon, wie Anthropic's Forschung zur Interpretierbarkeit von KI es ermöglicht, Sprachmodelle auf Konzeptebene zu manipulieren. Sie diskutieren das aufsehenerregende Golden Gate Claude-Experiment, bei dem ein Sprachmodell dazu gebracht wurde, in jeder Konversation die Golden Gate Bridge zu erwähnen, und erörtern die weitreichenden Implikationen dieser Technologie für die Zukunft der KI-Steuerung und -Sicherheit.

Support the show

  continue reading

208 episoade

Semua episod

×
 
Loading …

Bun venit la Player FM!

Player FM scanează web-ul pentru podcast-uri de înaltă calitate pentru a vă putea bucura acum. Este cea mai bună aplicație pentru podcast și funcționează pe Android, iPhone și pe web. Înscrieți-vă pentru a sincroniza abonamentele pe toate dispozitivele.

 

Ghid rapid de referință

Listen to this show while you explore
Play