AI Control: Improving Safety Despite Intentional Subversion AI Safety Fundamentals: Alignment podcast

Artwork

Tech Society Philosophy Blue Dot Impact

Content provided by BlueDot Impact. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by BlueDot Impact or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.

AI Safety Fundamentals: Alignment « »
AI Control: Improving Safety Despite Intentional Subversion

8M ago 20:51

Distribuie

MP3•Pagina episodului

Content provided by BlueDot Impact. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by BlueDot Impact or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.

We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even if AI instances are colluding to subvert the safety techniques. In this post:

We summarize the paper;
We compare our methodology to the methodology of other safety papers.

Source:
https://www.alignmentforum.org/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion
Narrated for AI Safety Fundamentals by Perrin Walker

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

… continue reading

Capitole

1. AI Control: Improving Safety Despite Intentional Subversion (00:00:00)

2. Paper summary (00:02:41)

3. Setup (00:02:43)

4. Evaluation methodology (00:04:59)

5. Results (00:06:25)

6. Relationship to other work (00:10:51)

7. Future work (00:17:50)

83 episoade

#Tech #Society #Philosophy #Blue Dot Impact

Artwork

AI Control: Improving Safety Despite Intentional Subversion

AI Safety Fundamentals: Alignment

published 8M ago

Distribuie

MP3•Pagina episodului

Content provided by BlueDot Impact. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by BlueDot Impact or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.

We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even if AI instances are colluding to subvert the safety techniques. In this post:

We summarize the paper;
We compare our methodology to the methodology of other safety papers.

Source:
https://www.alignmentforum.org/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion
Narrated for AI Safety Fundamentals by Perrin Walker

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

… continue reading

Capitole

1. AI Control: Improving Safety Despite Intentional Subversion (00:00:00)

2. Paper summary (00:02:41)

3. Setup (00:02:43)

4. Evaluation methodology (00:04:59)

5. Results (00:06:25)

6. Relationship to other work (00:10:51)

7. Future work (00:17:50)

83 episoade

#Tech #Society #Philosophy #Blue Dot Impact

Toate episoadele

×

Bun venit la Player FM!

Player FM scanează web-ul pentru podcast-uri de înaltă calitate pentru a vă putea bucura acum. Este cea mai bună aplicație pentru podcast și funcționează pe Android, iPhone și pe web. Înscrieți-vă pentru a sincroniza abonamentele pe toate dispozitivele.

Ascultă peste 500 de subiecte

Ghid rapid de referință

Podcast-uri de top

Florin Rosoga Podcast

Morning Glory cu Răzvan Exarhu

Epic Show Podcast

UPGRADE 100 Podcasts

România în direct - Europa FM

Deşteptarea - Europa FM

Ajutor/FAQ | Upgrade | Advertise

Arte|Afaceri|Comedie|Economie|Divertisment|Știri|Politică|Religie

Ştiinţă|Fotbal|Sport|Povestiri|Tehnologie|True Crime

Drepturi de autor 2024 | Harta site-ului | Politica de confidenţialitate | Termenii serviciului | | Copyright