Treceți offline cu aplicația Player FM !
The AdEMAMix Optimizer: Better, Faster, Older
Manage episode 438438307 series 3524393
This paper critiques single EMA usage in momentum optimizers, proposing AdEMAMix, which combines two EMAs for improved gradient relevance, faster convergence, and reduced model forgetting in training.
https://arxiv.org/abs//2409.03137
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1645 episoade
Manage episode 438438307 series 3524393
This paper critiques single EMA usage in momentum optimizers, proposing AdEMAMix, which combines two EMAs for improved gradient relevance, faster convergence, and reduced model forgetting in training.
https://arxiv.org/abs//2409.03137
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1645 episoade
Усі епізоди
×Bun venit la Player FM!
Player FM scanează web-ul pentru podcast-uri de înaltă calitate pentru a vă putea bucura acum. Este cea mai bună aplicație pentru podcast și funcționează pe Android, iPhone și pe web. Înscrieți-vă pentru a sincroniza abonamentele pe toate dispozitivele.