Illustrating Reinforcement Learning From Human Feedback (RLHF) AI Safety Fundamentals: Alignment podcast

Artwork

Tech Society Philosophy Blue Dot Impact

Content provided by BlueDot Impact. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by BlueDot Impact or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.

AI Safety Fundamentals: Alignment »
Illustrating Reinforcement Learning from Human Feedback (RLHF)

5M ago 22:32

Distribuie

MP3•Pagina episodului

Content provided by BlueDot Impact. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by BlueDot Impact or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.

This more technical article explains the motivations for a system like RLHF, and adds additional concrete details as to how the RLHF approach is applied to neural networks.

While reading, consider which parts of the technical implementation correspond to the 'values coach' and 'coherence coach' from the previous video.

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

… continue reading

Capitole

1. Illustrating Reinforcement Learning from Human Feedback (RLHF) (00:00:00)

2. RLHF: Let’s take it step by step (00:03:16)

3. Pretraining language models (00:03:51)

4. Reward model training (00:05:46)

5. Fine-tuning with RL (00:09:26)

6. Open-source tools for RLHF (00:16:10)

7. What’s next for RLHF? (00:18:20)

8. Further reading (00:21:17)

83 episoade

#Tech #Society #Philosophy #Blue Dot Impact

Artwork

Illustrating Reinforcement Learning from Human Feedback (RLHF)

AI Safety Fundamentals: Alignment

published 5M ago

Distribuie

MP3•Pagina episodului

Content provided by BlueDot Impact. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by BlueDot Impact or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.

This more technical article explains the motivations for a system like RLHF, and adds additional concrete details as to how the RLHF approach is applied to neural networks.

While reading, consider which parts of the technical implementation correspond to the 'values coach' and 'coherence coach' from the previous video.

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

… continue reading

Capitole

1. Illustrating Reinforcement Learning from Human Feedback (RLHF) (00:00:00)

2. RLHF: Let’s take it step by step (00:03:16)

3. Pretraining language models (00:03:51)

4. Reward model training (00:05:46)

5. Fine-tuning with RL (00:09:26)

6. Open-source tools for RLHF (00:16:10)

7. What’s next for RLHF? (00:18:20)

8. Further reading (00:21:17)

83 episoade

#Tech #Society #Philosophy #Blue Dot Impact

Tüm bölümler

×

Bun venit la Player FM!

Player FM scanează web-ul pentru podcast-uri de înaltă calitate pentru a vă putea bucura acum. Este cea mai bună aplicație pentru podcast și funcționează pe Android, iPhone și pe web. Înscrieți-vă pentru a sincroniza abonamentele pe toate dispozitivele.

Ascultă peste 500 de subiecte

Ghid rapid de referință

Podcast-uri de top

Florin Rosoga Podcast

Morning Glory cu Răzvan Exarhu

Epic Show Podcast

UPGRADE 100 Podcasts

România în direct - Europa FM

Deşteptarea - Europa FM

Ajutor/FAQ | Upgrade | Advertise

Arte|Afaceri|Comedie|Economie|Divertisment|Știri|Politică|Religie

Ştiinţă|Fotbal|Sport|Povestiri|Tehnologie|True Crime

Drepturi de autor 2024 | Harta site-ului | Politica de confidenţialitate | Termenii serviciului | | Copyright