Artwork

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.
Player FM - Aplicație Podcast
Treceți offline cu aplicația Player FM !

AF - AXRP Episode 31 - Singular Learning Theory with Daniel Murfet by DanielFilan

1:44:56
 
Distribuie
 

Manage episode 416920351 series 3314709
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AXRP Episode 31 - Singular Learning Theory with Daniel Murfet, published by DanielFilan on May 7, 2024 on The AI Alignment Forum. What's going on with deep learning? What sorts of models get learned, and what do the learning dynamics? Singular learning theory is a theory of Bayesian statistics broad enough in scope to encompass deep neural networks that may help answer these questions. In this episode, I speak with Daniel Murfet about this research program and what it tells us. Topics we discuss: What is singular learning theory? Phase transitions Estimating the local learning coefficient Singular learning theory and generalization Singular learning theory vs other deep learning theory How singular learning theory hit AI alignment Payoffs of singular learning theory for AI alignment Does singular learning theory advance AI capabilities? Open problems in singular learning theory for AI alignment What is the singular fluctuation? How geometry relates to information Following Daniel Murfet's work In this transcript, to improve readability, first names are omitted from speaker tags. Filan: Hello, everybody. In this episode, I'll be speaking with Daniel Murfet, a researcher at the University of Melbourne studying singular learning theory. For links to what we're discussing, you can check the description of this episode and you can read the transcripts at axrp.net. All right, well, welcome to AXRP. Murfet: Yeah, thanks a lot. What is singular learning theory? Filan: Cool. So I guess we're going to be talking about singular learning theory a lot during this podcast. So, what is singular learning theory? Murfet: Singular learning theory is a subject in mathematics. You could think of it as a mathematical theory of Bayesian statistics that's sufficiently general with sufficiently weak hypotheses to actually say non-trivial things about neural networks, which has been a problem for some approaches that you might call classical statistical learning theory. This is a subject that's been developed by a Japanese mathematician, Sumio Watanabe, and his students and collaborators over the last 20 years. And we have been looking at it for three or four years now and trying to see what it can say about deep learning in the first instance and, more recently, alignment. Filan: Sure. So what's the difference between singular learning theory and classical statistical learning theory that makes it more relevant to deep learning? Murfet: The "singular" in singular learning theory refers to a certain property of the class of models. In statistical learning theory, you typically have several mathematical objects involved. One would be a space of parameters, and then for each parameter you have a probability distribution, the model, over some other space, and you have a true distribution, which you're attempting to model with that pair of parameters and models. And in regular statistical learning theory, you have some important hypotheses. Those hypotheses are, firstly, that the map from parameters to models is injective, and secondly (quite similarly, but a little bit distinct technically) is that if you vary the parameter infinitesimally, the probability distribution it parameterizes also changes. This is technically the non-degeneracy of the Fisher information metric. But together these two conditions basically say that changing the parameter changes the distribution changes the model. And so those two conditions together are in many of the major theorems that you'll see when you learn statistics, things like the Cramér-Rao bound, many other things; asymptotic normality, which describes the fact that as you take more samples, your model tends to concentrate in a way that looks like a Gaussian distribution around the most likely parameter. So these are sort of basic ingredients in understandi...
  continue reading

2406 episoade

Artwork
iconDistribuie
 
Manage episode 416920351 series 3314709
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AXRP Episode 31 - Singular Learning Theory with Daniel Murfet, published by DanielFilan on May 7, 2024 on The AI Alignment Forum. What's going on with deep learning? What sorts of models get learned, and what do the learning dynamics? Singular learning theory is a theory of Bayesian statistics broad enough in scope to encompass deep neural networks that may help answer these questions. In this episode, I speak with Daniel Murfet about this research program and what it tells us. Topics we discuss: What is singular learning theory? Phase transitions Estimating the local learning coefficient Singular learning theory and generalization Singular learning theory vs other deep learning theory How singular learning theory hit AI alignment Payoffs of singular learning theory for AI alignment Does singular learning theory advance AI capabilities? Open problems in singular learning theory for AI alignment What is the singular fluctuation? How geometry relates to information Following Daniel Murfet's work In this transcript, to improve readability, first names are omitted from speaker tags. Filan: Hello, everybody. In this episode, I'll be speaking with Daniel Murfet, a researcher at the University of Melbourne studying singular learning theory. For links to what we're discussing, you can check the description of this episode and you can read the transcripts at axrp.net. All right, well, welcome to AXRP. Murfet: Yeah, thanks a lot. What is singular learning theory? Filan: Cool. So I guess we're going to be talking about singular learning theory a lot during this podcast. So, what is singular learning theory? Murfet: Singular learning theory is a subject in mathematics. You could think of it as a mathematical theory of Bayesian statistics that's sufficiently general with sufficiently weak hypotheses to actually say non-trivial things about neural networks, which has been a problem for some approaches that you might call classical statistical learning theory. This is a subject that's been developed by a Japanese mathematician, Sumio Watanabe, and his students and collaborators over the last 20 years. And we have been looking at it for three or four years now and trying to see what it can say about deep learning in the first instance and, more recently, alignment. Filan: Sure. So what's the difference between singular learning theory and classical statistical learning theory that makes it more relevant to deep learning? Murfet: The "singular" in singular learning theory refers to a certain property of the class of models. In statistical learning theory, you typically have several mathematical objects involved. One would be a space of parameters, and then for each parameter you have a probability distribution, the model, over some other space, and you have a true distribution, which you're attempting to model with that pair of parameters and models. And in regular statistical learning theory, you have some important hypotheses. Those hypotheses are, firstly, that the map from parameters to models is injective, and secondly (quite similarly, but a little bit distinct technically) is that if you vary the parameter infinitesimally, the probability distribution it parameterizes also changes. This is technically the non-degeneracy of the Fisher information metric. But together these two conditions basically say that changing the parameter changes the distribution changes the model. And so those two conditions together are in many of the major theorems that you'll see when you learn statistics, things like the Cramér-Rao bound, many other things; asymptotic normality, which describes the fact that as you take more samples, your model tends to concentrate in a way that looks like a Gaussian distribution around the most likely parameter. So these are sort of basic ingredients in understandi...
  continue reading

2406 episoade

Toate episoadele

×
 
Loading …

Bun venit la Player FM!

Player FM scanează web-ul pentru podcast-uri de înaltă calitate pentru a vă putea bucura acum. Este cea mai bună aplicație pentru podcast și funcționează pe Android, iPhone și pe web. Înscrieți-vă pentru a sincroniza abonamentele pe toate dispozitivele.

 

Ghid rapid de referință