Artwork

Content provided by Soroush Pour. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Soroush Pour or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.
Player FM - Aplicație Podcast
Treceți offline cu aplicația Player FM !

Ep 10 - Accelerated training to become an AI safety researcher w/ Ryan Kidd (Co-Director, MATS)

1:16:58
 
Distribuie
 

Manage episode 382650717 series 3428190
Content provided by Soroush Pour. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Soroush Pour or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.

We speak with Ryan Kidd, Co-Director at ML Alignment & Theory Scholars (MATS) program, previously "SERI MATS".
MATS (https://www.matsprogram.org/) provides research mentorship, technical seminars, and connections to help new AI researchers get established and start producing impactful research towards AI safety & alignment.
Prior to MATS, Ryan completed a PhD in Physics at the University of Queensland (UQ) in Australia.
We talk about:
* What the MATS program is
* Who should apply to MATS (next *deadline*: Nov 17 midnight PT)
* Research directions being explored by MATS mentors, now and in the past
* Promising alignment research directions & ecosystem gaps , in Ryan's view
Hosted by Soroush Pour. Follow me for more AGI content:
* Twitter: https://twitter.com/soroushjp
* LinkedIn: https://www.linkedin.com/in/soroushjp/
== Show links ==
-- About Ryan --
* Twitter: https://twitter.com/ryan_kidd44
* LinkedIn: https://www.linkedin.com/in/ryan-kidd-1b0574a3/
* MATS: https://www.matsprogram.org/
* LISA: https://www.safeai.org.uk/
* Manifold: https://manifold.markets/
-- Further resources --
* Book: “The Precipice” - https://theprecipice.com/
* Ikigai - https://en.wikipedia.org/wiki/Ikigai
* Fermi paradox - https://en.wikipedia.org/wiki/Fermi_p...
* Ajeya Contra - Bioanchors - https://www.cold-takes.com/forecastin...
* Chomsky hierarchy & LLM transformers paper + external memory - https://en.wikipedia.org/wiki/Chomsky...
* AutoGPT - https://en.wikipedia.org/wiki/Auto-GPT
* BabyAGI - https://github.com/yoheinakajima/babyagi
* Unilateralist's curse - https://forum.effectivealtruism.org/t...
* Jeffrey Ladish & team - fine tuning to remove LLM safeguards - https://www.alignmentforum.org/posts/...
* Epoch AI trends - https://epochai.org/trends
* The demon "Moloch" - https://slatestarcodex.com/2014/07/30...
* AI safety fundamentals course - https://aisafetyfundamentals.com/
* Anthropic sycophancy paper - https://www.anthropic.com/index/towar...
* Promising technical alignment research directions
* Scalable oversight
* Recursive reward modelling - https://deepmindsafetyresearch.medium...
* RLHF - could work for a while, but unlikely forever as we scale
* Interpretability
* Mechanistic interpretability
* Paper: GPT4 labelling GPT2 - https://openai.com/research/language-...
* Concept based interpretability
* Rome paper - https://rome.baulab.info/
* Developmental interpretability
* devinterp.com - http://devinterp.com
* Timaeus - https://timaeus.co/
* Internal consistency
* Colin Burns research - https://arxiv.org/abs/2212.03827
* Threat modelling / capabilities evaluation & demos
* Paper: Can large language models democratize access to dual-use biotechnology? - https://arxiv.org/abs/2306.03809
* ARC Evals - https://evals.alignment.org/
* Palisade Research - https://palisaderesearch.org/
* Paper: Situational awareness with Owain Evans - https://arxiv.org/abs/2309.00667
* Gradient hacking - https://www.lesswrong.com/posts/uXH4r6MmKPedk8rMA/gradient-hacking
* Past scholar's work
* Apollo Research - https://www.apolloresearch.ai/
* Leap Labs - https://www.leap-labs.com/
* Timaeus - https://timaeus.co/
* Other orgs mentioned
* Redwood Research - https://redwoodresearch.org/
Recorded Oct 25, 2023

  continue reading

15 episoade

Artwork
iconDistribuie
 
Manage episode 382650717 series 3428190
Content provided by Soroush Pour. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Soroush Pour or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.

We speak with Ryan Kidd, Co-Director at ML Alignment & Theory Scholars (MATS) program, previously "SERI MATS".
MATS (https://www.matsprogram.org/) provides research mentorship, technical seminars, and connections to help new AI researchers get established and start producing impactful research towards AI safety & alignment.
Prior to MATS, Ryan completed a PhD in Physics at the University of Queensland (UQ) in Australia.
We talk about:
* What the MATS program is
* Who should apply to MATS (next *deadline*: Nov 17 midnight PT)
* Research directions being explored by MATS mentors, now and in the past
* Promising alignment research directions & ecosystem gaps , in Ryan's view
Hosted by Soroush Pour. Follow me for more AGI content:
* Twitter: https://twitter.com/soroushjp
* LinkedIn: https://www.linkedin.com/in/soroushjp/
== Show links ==
-- About Ryan --
* Twitter: https://twitter.com/ryan_kidd44
* LinkedIn: https://www.linkedin.com/in/ryan-kidd-1b0574a3/
* MATS: https://www.matsprogram.org/
* LISA: https://www.safeai.org.uk/
* Manifold: https://manifold.markets/
-- Further resources --
* Book: “The Precipice” - https://theprecipice.com/
* Ikigai - https://en.wikipedia.org/wiki/Ikigai
* Fermi paradox - https://en.wikipedia.org/wiki/Fermi_p...
* Ajeya Contra - Bioanchors - https://www.cold-takes.com/forecastin...
* Chomsky hierarchy & LLM transformers paper + external memory - https://en.wikipedia.org/wiki/Chomsky...
* AutoGPT - https://en.wikipedia.org/wiki/Auto-GPT
* BabyAGI - https://github.com/yoheinakajima/babyagi
* Unilateralist's curse - https://forum.effectivealtruism.org/t...
* Jeffrey Ladish & team - fine tuning to remove LLM safeguards - https://www.alignmentforum.org/posts/...
* Epoch AI trends - https://epochai.org/trends
* The demon "Moloch" - https://slatestarcodex.com/2014/07/30...
* AI safety fundamentals course - https://aisafetyfundamentals.com/
* Anthropic sycophancy paper - https://www.anthropic.com/index/towar...
* Promising technical alignment research directions
* Scalable oversight
* Recursive reward modelling - https://deepmindsafetyresearch.medium...
* RLHF - could work for a while, but unlikely forever as we scale
* Interpretability
* Mechanistic interpretability
* Paper: GPT4 labelling GPT2 - https://openai.com/research/language-...
* Concept based interpretability
* Rome paper - https://rome.baulab.info/
* Developmental interpretability
* devinterp.com - http://devinterp.com
* Timaeus - https://timaeus.co/
* Internal consistency
* Colin Burns research - https://arxiv.org/abs/2212.03827
* Threat modelling / capabilities evaluation & demos
* Paper: Can large language models democratize access to dual-use biotechnology? - https://arxiv.org/abs/2306.03809
* ARC Evals - https://evals.alignment.org/
* Palisade Research - https://palisaderesearch.org/
* Paper: Situational awareness with Owain Evans - https://arxiv.org/abs/2309.00667
* Gradient hacking - https://www.lesswrong.com/posts/uXH4r6MmKPedk8rMA/gradient-hacking
* Past scholar's work
* Apollo Research - https://www.apolloresearch.ai/
* Leap Labs - https://www.leap-labs.com/
* Timaeus - https://timaeus.co/
* Other orgs mentioned
* Redwood Research - https://redwoodresearch.org/
Recorded Oct 25, 2023

  continue reading

15 episoade

Toate episoadele

×
 
Loading …

Bun venit la Player FM!

Player FM scanează web-ul pentru podcast-uri de înaltă calitate pentru a vă putea bucura acum. Este cea mai bună aplicație pentru podcast și funcționează pe Android, iPhone și pe web. Înscrieți-vă pentru a sincroniza abonamentele pe toate dispozitivele.

 

Ghid rapid de referință