Artwork

Content provided by HackerNoon. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by HackerNoon or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.
Player FM - Aplicație Podcast
Treceți offline cu aplicația Player FM !

AI Safety and Alignment: Could LLMs Be Penalized for Deepfakes and Misinformation?

8:10
 
Distribuie
 

Manage episode 430727965 series 3474148
Content provided by HackerNoon. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by HackerNoon or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.

This story was originally published on HackerNoon at: https://hackernoon.com/ai-safety-and-alignment-could-llms-be-penalized-for-deepfakes-and-misinformation-ecabdwv.
Penalty-tuning for LLMs: Where they can be penalized for misuses or negative outputs, within their awareness, as another channel for AI safety and alignment.
Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #ai-safety, #ai-alignment, #agi, #superintelligence, #llms, #deepfakes, #misinformation, #hackernoon-top-story, and more.
This story was written by: @davidstephen. Learn more about this writer by checking @davidstephen's about page, and for more stories, please visit hackernoon.com.
A research area for AI safety and alignment could be to seek out how some memory or compute access of large language models [LLMs] might be briefly truncated, as a form of penalty for certain outputs or misuses, including biological threats. AI should not just be able to refuse an output, acting within guardrail, but slow the next response or shut down for that user, so that it is not penalized itself. LLMs have—large—language awareness and usage awareness, these could be channels to make it know, after pre-training that it could lose something, if it outputs deepfakes, misinformation, biological threats, or if it continues to allow a misuser try different prompts without shutting down or slowing against openness to a malicious intent. This could make it safer, since it would lose something and will know it has.

  continue reading

316 episoade

Artwork
iconDistribuie
 
Manage episode 430727965 series 3474148
Content provided by HackerNoon. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by HackerNoon or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.

This story was originally published on HackerNoon at: https://hackernoon.com/ai-safety-and-alignment-could-llms-be-penalized-for-deepfakes-and-misinformation-ecabdwv.
Penalty-tuning for LLMs: Where they can be penalized for misuses or negative outputs, within their awareness, as another channel for AI safety and alignment.
Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #ai-safety, #ai-alignment, #agi, #superintelligence, #llms, #deepfakes, #misinformation, #hackernoon-top-story, and more.
This story was written by: @davidstephen. Learn more about this writer by checking @davidstephen's about page, and for more stories, please visit hackernoon.com.
A research area for AI safety and alignment could be to seek out how some memory or compute access of large language models [LLMs] might be briefly truncated, as a form of penalty for certain outputs or misuses, including biological threats. AI should not just be able to refuse an output, acting within guardrail, but slow the next response or shut down for that user, so that it is not penalized itself. LLMs have—large—language awareness and usage awareness, these could be channels to make it know, after pre-training that it could lose something, if it outputs deepfakes, misinformation, biological threats, or if it continues to allow a misuser try different prompts without shutting down or slowing against openness to a malicious intent. This could make it safer, since it would lose something and will know it has.

  continue reading

316 episoade

Toate episoadele

×
 
Loading …

Bun venit la Player FM!

Player FM scanează web-ul pentru podcast-uri de înaltă calitate pentru a vă putea bucura acum. Este cea mai bună aplicație pentru podcast și funcționează pe Android, iPhone și pe web. Înscrieți-vă pentru a sincroniza abonamentele pe toate dispozitivele.

 

Ghid rapid de referință