Beta
Logo of the podcast AI Explained Official Podcast

AI Explained Official Podcast (Philip - Host of AI Explained YT)

Explore every episode of AI Explained Official Podcast

Dive into the complete episode list for AI Explained Official Podcast. Each episode is cataloged with detailed descriptions, making it easy to find and explore specific topics. Keep track of all episodes from your favorite podcast and never miss a moment of insightful content.

Rows per page:

1–23 of 23

Pub. DateTitleDuration
28 Oct 2024The New Claude 3.5 Sonnet: Better, Yes, But Not Just in the Way You Might Think00:22:34

A new state of the art LLM (at least for creative writing and basic reasoning) but what lies behind the numbers that were put out? Is it for real, and are AI agents about to grab your mouse and shake your cursor?

Plus, results on my own Simple Bench, and new tools from Runway (Act-One), HeyGen (Zoom Calls) and an updated NotebookLM. AI, without the hype.

Weights and Biases' Weave: https://wandb.me/ai_explained

01 Nov 2024ChatGPT with Search, Altman Answers Anything and Simple Bench Out00:15:20

The Google destroyer, the Perplexity crusher? Or just hype? ChatGPT with Search is here, and simultaneously Altman and co did an AMA on Reddit, covering GPT-5, Sora, SearchGPT and a lot more. Plus, the biggest news of them all: Simple Bench is out.

ChatGPT with Search: https://openai.com/index/introducing-chatgpt-search/

Altman AMA (ask me anything): https://www.reddit.com/r/ChatGPT/comments/1ggixzy/ama_with_openais_sam_altman_kevin_weil_srinivas/

https://x.com/sama/status/1852041075793522911

Perplexity Ads: https://www.cnbc.com/2024/08/22/perplexity-ai-plans-to-start-running-search-ads-in-fourth-quarter.html

Perplexity: https://www.perplexity.ai/

https://simple-bench.com/

10 Nov 2024Leak: ‘GPT-5 exhibits diminishing returns’, Sam Altman: ‘lol’00:15:44

The last few days have seen two narratives emerge. One, derived from yesterday’s OpenAI leak in TheInformation, that GPT-5/Orion is a disappointment, and less of a leap than GPT-3 to GPT-4. The second comes from a series of 4 clips (shown in this video) from Sam Altman, regarding the ‘clear path’ to AGI. Let’s go beyond the headlines (and through papers like Frontier Math) to get closer to the ground truth…
 
 Plus Universal-2, Sora comments, Claude 3.5 Haiku SimpleBench update, and a great new AI video.


Assembly AI Speech to Text: https://www.assemblyai.com/?utm_source=youtube&utm_medium=influencer&utm_campaign=ai_explained 

 

00:39 – Bear Case, TheInformation Leak

04:01 – Bull Case, Sam Altman

06:20 – FrontierMath

11:29 – o1 Paradigm

13:11 – Text to Video Greatness and Universal-2 

 

TheInformation Leak: https://www.theinformation.com/articles/openai-shifts-strategy-as-rate-of-gpt-ai-improvements-slows?rc=sy0ihq

Noam Brown Replies: https://x.com/polynoamial/status/1855453104394637444

Sam Altman Y-Combinator Interview: https://www.youtube.com/watch?v=xXCBz_8hM9w&t=1556s

Altman Reply: https://x.com/sama/status/1855100359511097828

https://simple-bench.com/

FrontierMath Paper: https://arxiv.org/pdf/2411.04872

Frontier Math Blog Post: https://epochai.org/frontiermath

Tao: https://x.com/EpochAIResearch/status/1854996368814936250

MMLU Are We Done (cites me!): https://arxiv.org/pdf/2406.04127

Universal-2 https://www.assemblyai.com/research/universal-2

Noam Brown ‘We don’t know’: https://www.youtube.com/watch?v=Gr_eYXdHFis

Anthropic Founder Response: https://x.com/jackclarkSF/status/1855485569998217231

Sora (Runway Comment): https://x.com/c_valenzuelab/status/1855026417354129455

Sora New Vid: https://www.youtube.com/watch?v=_iETa2KDRuw

Darri3D Video: https://www.reddit.com/r/ChatGPT/comments/1gn0n3z/can_you/

15 Nov 2024New Google Model Ranked ‘No. 1 LLM’, But There’s a Problem00:15:19

A new and mysterious Gemini model appears at the top of the leaderboard, but is that the full story? I dig behind the headline to show you some anti-climactic results, give some context with leaks in the last 48 hours of diminishing returns to scaling, and add the response of Altman, OpenAI and co. The future is about to look a lot stranger...


80,000 hours Podcast and Channel: https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib
https://www.youtube.com/@eightythousandhours/videos          

 

You can now gift memberships to AI Insiders (my Patreon w/ exclusive vids, network): https://www.patreon.com/AIExplained/gift


‘There is no wall’: https://x.com/sama/status/1856941766915641580

https://x.com/vedantmisra/status/1857148554105544708

Gemini Ranking: https://lmarena.ai/?leaderboard

API not yet up: https://x.com/OfficialLoganK/status/1857106844805681153

‘Just Die Chat’: https://x.com/koltregaskes/status/1856754648146653428

Google CEO tweet: https://x.com/sundarpichai/status/1857114106928718329

Sutskever Quote: https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/

Another OpenAI Staffer Leaves: https://x.com/RichardMCNgo/status/1856843040427839804

Bloomberg Report: https://www.bloomberg.com/news/articles/2024-11-13/openai-google-and-anthropic-are-struggling-to-build-more-advanced-ai?s=09

Noam Brown on what OpenAI Researchers Believe: https://x.com/polynoamial/status/1855037689533178289

Clive Chan: https://x.com/itsclivetime/status/1855704120495329667

Chollet Responds to Altman: https://x.com/fchollet/status/1857060079586975852

https://x.com/sama/status/1856940152460869718

Altman Emails: https://x.com/TechEmails/status/1857285960997712356

Change of Heart: https://sd11.senate.ca.gov/news/senator-wiener-responds-openai-opposition-sb-1047

Amodei on ‘Empirical Regularities’: https://lexfridman.com/dario-amodei-transcript/

Verge Report: https://www.theverge.com/2024/10/25/24279600/google-next-gemini-ai-model-openai-december

OpenAI Agents in January: https://www.bloomberg.com/news/articles/2024-11-13/openai-nears-launch-of-ai-agents-to-automate-tasks-for-users?srnd=phx-ai

05 Dec 2024AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution00:15:29

Calmest before the storm? Whatever analogy you want to use things had gotten quiet toward the end of 2024. But then tonight we got Genie 2, and a series of scheduled announcements from OpenAI. Sora is soon here, and o1, but I dive deeper into what it all means and whether reliability is on a path to being solved, ft: two recent papers. 

Assembly AI Speech to Text: https://www.assemblyai.com/?utm_source=youtube&utm_medium=influencer&utm_campaign=ai_explained 

Plus Kling Motion Brush, Simple Bench QwQ update and much more.


Genie 2: https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/

Jim Cramer: https://x.com/jimcramer/status/1864068878692675625

Give Us Full o1: https://x.com/tszzl/status/1863882905422106851

Verge Scoop: https://x.com/tomwarren/status/1864326361415925861

O1 Learning to Reason Benchmarks: https://openai.com/index/learning-to-reason-with-llms/

SIMA AI: https://arxiv.org/pdf/2404.10179

Genie Paper: https://arxiv.org/pdf/2402.15391

My Video on Genie: https://www.youtube.com/watch?v=gGKsfXkSXv8

Oasis Minecraft: https://x.com/risphereeditor/status/1852619965511204974

LLMs Procedural Knowledge Paper: https://arxiv.org/pdf/2411.12580

Bag of Heuristics Paper: https://arxiv.org/pdf/2410.21272

Jensen Huang Hallucinations: https://www.tomshardware.com/tech-industry/artificial-intelligence/jensen-says-we-are-several-years-away-from-solving-the-ai-hallucination-problem-in-the-meantime-we-have-to-keep-increasing-our-computation

DeepSeek Interview: https://www.chinatalk.media/p/deepseek-ceo-interview-with-chinas

Kling Motion Brush: https://klingai.com/image-to-video


Tim Rocktaschel Book: https://geni.us/ArtificialIntelligence


00:43 - OpenAI 12 Days, Sora Turbo, o1

03:06 - Genie 2

08:26 - Jensen Huang and Altman Hallucination Predictions

09:45 - Bag of Heuristics Paper

11:40 - Procedural Knowledge Paper
13:02 - AssemblyAI Universal 2

13:45 - SimpleBench QwQ and Chinese Models

14:42 - Kling Motion Brush



05 Dec 2024o1 Pro Mode – Full Analysis (plus o1 paper highlights)00:16:43

Oh boy. o1 pro mode out on the same night as o1 full. I read the 49 page paper, ran my own tests, spent my fuel allowance on Pro Mode and will give you all the highlights. Suffice to say the story is not as simple as it first appears. 

Weights and Biases’ Weave: wandb.me/ai_explained

Plus, GPT-4.5? MLE Bench, Simple Update, Image Analysis and much more 

 

o1 System Card: https://cdn.openai.com/o1-system-card-20241205.pdf

Apollo Research: https://www.apolloresearch.ai/research/scheming-reasoning-evaluations

Altman Tweet: https://x.com/AnonCEOMakeItAi/status/1864763052622504344

ChatGPT Pro: https://openai.com/index/introducing-chatgpt-pro/

Tibor Blaho: https://x.com/btibor91/status/1864709670470066605

Simple-bench.com 

 

00:00 - Introduction

00:27 - ChatGPT Pro is $200

01:25 - OpenAI Benchmarks

03:20 - o1 System Card, o1 and o1 Pro Mode vs o1-preview

06:18 - Simple Bench surprising results on sample

08:31 - Weight & Biases

09:05 - Image Analysis Compared

12:51 - More Benchmarks and Safety

10 Dec 2024Sora is Out, But is it a Distraction?00:15:34

After a 10 month wait, OpenAI have released Sora to paying users. With just a prompt it can generate videos of up to 20 seconds in lower resolutions, and 10 seconds at 1080p if you can fork out $200/month. I’ve tested it and read the system card. The user interface is quite beautiful, even if the videos themselves operate until entirely new rules of physics. But I can’t help wondering if OpenAI want up to focus on releases like this, rather than some quietly broken promises. 



80,000 hours Website, Podcast + Channel: 

https://80000hours.org/

https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib https://www.youtube.com/@eightythousandhours/videos


https://openai.com/sora/


Sora Countries: https://help.openai.com/en/articles/10250692-sora-supported-countries

Sora Credits: https://help.openai.com/en/articles/10245774-sora-billing-credits-faq

https://runwayml.com/ and https://pika.art/home 


DeepMind Veo: https://deepmind.google/technologies/veo/


Sam Altman Ads as Last Resort: https://www.windowscentral.com/software-apps/openai-could-chase-intrusive-ads-as-last-resort


But OpenAI Considering Ads: https://www.inc.com/ben-sherry/is-openai-getting-into-the-advertising-business-the-company-is-sending-mixed-messages/91033533


OpenAI Backtracks on Microsoft AGI Clause: https://www.ft.com/content/2c14b89c-f363-4c2a-9dfc-13023b6bce65


As Microsoft Boast of Labor Savings: https://www.theinformation.com/articles/microsofts-new-sales-pitch-for-ai-spend-less-money-on-humans?rc=sy0ihq


OpenAI Military Pivot: https://www.technologyreview.com/2024/12/04/1107897/openais-new-defense-contract-completes-its-military-pivot/


Employees Have Doubts: https://www.washingtonpost.com/technology/2024/12/06/openai-anduril-employee-military-ai/?nid=top_pb_signin&arcId=KZIV7PLRHBCVNPAIAAAVUNRHIM&account_location=ONSITE_HEADER_ARTICLE



12 Dec 2024Never Browse Alone? - Gemini 2 Live and ChatGPT Vision00:13:40

The ‘Gemini 2 Era’ begins … with screen-sharing? But really, it’s a great free tool, for curiosity satisfying rather than bleeding-edge intelligence. I give you the benchmarks, the highlights and of course, the latest from OpenAI Advanced Voice Mode with Vision. 

Plus Deep Research in Gemini Advanced, Simple Bench updates, Santa and what might be for some of you Google’s deflating admission. 


00:00 - Introduction

00:38 - Live Interaction 

03:43 - Gemini 2.0 Flash Benchmarks 

05:10 - Audio and Image Output

06:38 - Project Mariner (+ WebVoyager Bench)

08:49 - But Progress Slowing Down?

10:43 - OpenAI Announcements + Games



https://aistudio.google.com/live

Gemini 2.0 Flash Benchmarks: https://deepmind.google/technologies/gemini/

Project mariner: https://deepmind.google/technologies/project-mariner/

WebVoyager: https://x.com/laurentsifre/status/1858918588683296875/photo/1

Gemini Game play: https://www.youtube.com/watch?v=IKuGNHJBGsc

Advanced Voice Mode OpenAI: https://www.youtube.com/watch?v=NIQDnWlwYyQ

https://simple-bench.com/

Claude Computer Use: https://docs.anthropic.com/en/docs/build-with-claude/computer-use

Oriol Vinyals Interview: https://www.youtube.com/watch?v=78mEYaztGaw&t=687s



21 Dec 2024o3 - wow00:22:20

o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the highlights, benchmarks broken, and what comes next. Plus, the costs OpenAI didn’t want us to know, Genesis, ARC-AGI 2, Gemini-Thinking, and much more. 


FrontierMath: https://epoch.ai/frontiermath

https://arxiv.org/pdf/2411.04872

Chollet Statement:https://arcprize.org/blog/oai-o3-pub-breakthrough

MLC Paper: 

https://www.scientificamerican.com/article/new-training-method-helps-ai-generalize-like-people-do/?utm_campaign=socialflow&utm_source=twitter&utm_medium=social

AlphaCode 2: https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf

Human Performance on ARC-AGI: https://arxiv.org/pdf/2409.01374v1

Wei Tweet ‘3 months’:https://x.com/_jasonwei/status/1870184982007644614

Deliberative Alignment Paper: https://openai.com/index/deliberative-alignment/

Brown Safety Tweet: https://x.com/polynoamial/status/1870196476908834893

Swe-Bench Verified: https://openai.com/index/introducing-swe-bench-verified/

Amodei Prediction: https://x.com/OfirPress/status/1858567863788769518

David Dohan: 16 hours https://x.com/dmdohan/status/1870171404093796638

OpenAI Personal Writing: https://openai.com/index/learning-to-reason-with-llms/

https://simple-bench.com/

John Hallman Tweet: https://x.com/johnohallman/status/1870233375681945725


00:00 - Introduction

01:19 - What is o3?

03:18 - FrontierMath

05:15 - o4, o5

06:03 - GPQA

06:24 - Coding, Codeforces + SWE-verified, AlphaCode 2

08:13 - 1st Caveat

09:03 - Compositionality?

10:16 - SimpleBench?

13:11 - ARC-AGI, Chollet



08 Jan 2025OpenAI Backtracks on Superintelligence + Altman Brings His Timeline Forward00:23:41

Sam Altman unexpectedly brings his timelines to AGI forward, while OpenAI backtrack on superintelligence. None of these changes were heralded, but they are significant. Plus the new year brings new assessments of the true capability of models to automate 'large swathes of the economy'. I'll give my prediction on that front for 2025, announcement a new Simple Bench competition, and showcase Kling 1.6 vs Veo 2 vs Sora, and much more.

wandb.me/simple-bench

(Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharing


TheAgentCompany Paper: https://arxiv.org/pdf/2412.14161v1

Sam Altman Major Interview: https://www.bloomberg.com/features/2025-sam-altman-interview/?srnd=phx-ai

OpenAI Agent Coming Jan 2025: https://www.theinformation.com/articles/why-openai-is-taking-so-long-to-launch-agents?rc=sy0ihq

Altman Singularity: https://x.com/sama/status/1875603249472139576

Altman Original Timeline: https://www.youtube.com/watch?v=7dCPytNTnjk&t=621s

https://www.ft.com/content/34a7a082-e685-4e02-bca7-61ff89d99ed2

OpenAI Original Emails: https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman-and-openai-blog

DeepMind Sky News 2014 Article: https://news.sky.com/story/google-buys-uk-intelligence-firm-deepmind-10419783

Altman Blog Reflections: https://blog.samaltman.com/reflections

OpenAI Changes Who Gets AGI: https://openai.com/index/why-our-structure-must-evolve-to-advance-our-mission/?s=09

OpenAI 5 Levels: https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai

Altman 2015: https://blog.samaltman.com/machine-intelligence-part-1

OpenAI React to Anthropic: https://www.theinformation.com/articles/how-anthropic-got-inside-openais-head?rc=sy0ihq

Microsoft $100B Definition: https://www.theinformation.com/articles/microsoft-and-openai-wrangle-over-terms-of-their-blockbuster-partnership?rc=sy0ihq
Epoch Scramble for Task Benchmark: https://x.com/tamaybes/status/1876692639363612919

GPQA Progress: https://epoch.ai/data/ai-benchmarking-dashboard

Task Length Crucial for ARC-AGI: https://anokas.substack.com/p/llms-struggle-with-perception-not-reasoning-arcagi

RL Environment Tweet: https://x.com/vedantmisra/status/1876327518157807990

Jason Wei Talk: https://www.youtube.com/watch?v=yhpjpNXJDco

Miles Brunda

20 Jan 2025Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out00:13:11

OpenAI looks set to debut their Operator system, and some leaks are out. At the same time Deepseek R1 releases some numbers, and Sam Altman says he might have been wrong before, and now anticipates a 'fast take-off'.  Plus two papers to give you an idea of what a super-agent might be decent at doing, some more exclusive article analysis and much more. Who said anything else is happening today...

80,000 Hours Channel: https://www.youtube.com/channel/UCafjal1QYJ3rb0Y9xZk1Ezg
Spotify: https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
01:13 - Pro Cost and OpenAI Operator
04:00 - Agent Benchmarks Being Targeted
07:48 - Fast Take-off, Altman
08:48 - Altman flip-flops
10:02 - Deepseek R1 First Reaction

Altman ‘100x expectations out of control’: https://x.com/sama/status/1881258443669172470
OpenAI Operator Table: https://x.com/btibor91/status/1881285255266750564
WebVoyager: https://arxiv.org/pdf/2401.13919
OSWorld: https://arxiv.org/pdf/2404.07972
Axios Exclusive 1 (SuperAgent): https://www.axios.com/2025/01/19/ai-superagent-openai-meta?s=09
Axios Exclusive 2: https://www.axios.com/2025/01/18/biden-sullivan-ai-race-trump-china
Deepseek R1 Numbers: https://x.com/deepseek_ai/status/1881318130334814301
Does 1.5B outperform 3.5 Sonnet on Math?: https://x.com/reach_vb/status/1881319500089634954
Deepseek R1 (deepseek-reasoner) Pricing: https://api-docs.deepseek.com/quick_start/pricing/
Altman Fast Takeoff: https://x.com/tsarnick/status/1879100390840697191
OpenAI Economic Blueprint: https://cdn.openai.com/global-affairs/ai-in-america-oai-economic-blueprint-20250113.pdf
Target is Long-horizon Tasks: https://x.com/karinanguyen_/status/1879576037249667520
Support Regulations: https://www.techemails.com/p/elon-musk-and-openai
https://www.nytimes.com/2023/05/16/technology/openai-altman-artificial-intelligence-regulation.html
Donation: https://qz.com/sam-altman-donate-million-zuckerberg-bezos-donald-trump-1851721035
Amodei on Regulations by 2025: https://www.youtube.com/watch?v=ugvHCXCOmm4
‘Feel the AGI’: https://x.com/polynoamial?lang=en
GPT-5 and o-series merger: https://x.com/sama/status/1880358749187240274
o1 Thinks in Chinese: https://techcrunch.com/2025/01/14/openais-ai-reasoning-model-thinks-in-chinese-sometimes-and-no-one-really-knows-why/



Non-hype Newsletter: https://signaltonoise.beehiiv.com/

24 Jan 2025Nothing Much Happens in AI, Then Everything Does All At Once00:23:09

When it rains, it pours. OpenAI Operator tested and reviewed, with full paper analysis. Perplexity Assistant is useful. Then Stargate, is it all smoke and mirrors? Strong rumours of an o3+ model from Anthropic. Then a full breakdown of Deepseek R1, and what it’s training method says about the state of AI. It’s not open source BTW. Plus Humanity’s Last Exam, and Hassabis Accelerates his AGI timeline.

00:00 - Introduction

00:54 - OpenAI Operator

04:53 - Perplexity Assistant 

05:15 - StarGate

07:51 - Better than o3?

08:25 - DeepSeek R1 Analysis

12:12 - Training Secrets

15:19 - No More Process Rewarding ?

19:01 - Hassabis Timeline Accelerates

21:22 - Humanity’s Last Exam


https://app.grayswan.ai/arena/chat/harmful-ai-assistant

https://app.grayswan.ai/arena

https://openai.com/index/computer-using-agent/

System Prompt: https://github.com/wunderwuzzi23/scratch/blob/master/system_prompts/operator_system_prompt-2025-01-23.txt


OpenAI Operator: https://operator.chatgpt.com/

System Card: https://cdn.openai.com/operator_system_card.pdf


There is No Plan: https://x.com/jeffclune/status/1882120726339318007


Perplexity Assistant: https://x.com/perplexity_ai/status/1882466239123255686


Stargate: https://openai.com/index/announcing-the-stargate-project/

Labour goes to 0: https://moores.samaltman.com/

Larry Ellison AI Surveillance: https://x.com/TheChiefNerd/status/1882042989184430332

Amodei 1984: https://www.bloomberg.com/news/articles/2025-01-22/anthropic-ceo-says-openai-s-stargate-venture-seems-chaotic

Microsoft Hesitate: https://www.theinformation.com/articles/why-sam-altman-joined-forces-with-larry-ellison-and-took-a-step-back-from-microsoft?rc=sy0ihq


Dylan Patel o3+ for Anthropic: https://www.youtube.com/watch?v=7EH0VjM3dTk


Deepseek R1: https://arxiv.org/pdf/2501.12948

https://arxiv.org/pdf/2412.19437

Diagram: https://pbs.twimg.com/media/GhyQsM6WQAE7W52?format=jpg&name=large

https://simple-bench.com/

Process: https://x.com/sama/status/1664018190840614912

https://x.com/karpathy/status/1835561952258723930

https://openai.com/index/trading-inference-time-compute-for-adversarial-robustness/?s=09

Demis Interview: https://www.youtube.com/watch?v=yr0GiSgUvPU

Humanity’s Last Exam: 

https://agi.safe.ai/

https://x.com/DanHendrycks/status/1882481730671857815

https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html?s=09



31 Jan 2025o3-mini and the “AI War”00:15:21

o3-mini is here, and yes, I’ve read the paper in full - 2 hours after release, and even the post-launch Reddit AMA. Some epic details like a FrontierMath score that made me double-take, a likely new Cursor favorite, bio risk expertise and a cost-comparison with Deepseek R1., But does it perform on basic reasoning - let’s find out. Plus, arguably the bigger story - the increasingly frenetic rhetoric coming out of the West - and Dario Amodei and Alexandr Wang (CEOs of Anthropic and Scale AI respectively) in particular. The last thing we need is an “AI War”.


https://wandb.me/simple-bench


(Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharing


Chapters: 

00:00 - Introduction

00:45 - o3 mini

05:11 - First impressions vs Deepseek R1

07:21 - 10x Scale, o3-mini System Card, Amodei Essay, bitcoin wallets…

12:40 - Simple Competition Finale

13:03 - Clips and Final Thoughts on the “AI War”



O3-mini: https://openai.com/index/openai-o3-mini/

Paper: https://cdn.openai.com/o3-mini-system-card.pdf

Amodei Essay: https://darioamodei.com/on-deepseek-and-export-controls?s=09

FrontierMath wild stat:https://arxiv.org/pdf/2411.04872

Sam Altman Channels Napoleon: https://x.com/sama/status/1883185690508488934

Altman ‘pulls up releases’: https://x.com/sama/status/1884066337103962416

“AI War” by Wang: https://scale.com/blog/win-the-ai-war

Anthropic Original Views on Capabilities: https://www.anthropic.com/news/core-views-on-ai-safety

AI Insider Cost Comparison:https://x.com/arankomatsuzaki/status/1884676245922934788

Deepseek R1 Paper: https://arxiv.org/pdf/2501.12948

R1, o3-mini Price Comparison: https://techcrunch.com/2025/01/31/openai-launches-o3-mini-its-latest-reasoning-model/

Semianalysis on $1,3M deepseek salaries, and them falling behind as ‘the time gap to match US capabilities increases’: https://semianalysis.com/2025/01/31/deepseek-debates/

OpenAI Valuation: https://www.bloomberg.com/news/articles/2025-01-30/openai-in-talks-to-raise-funding-at-340-billion-value-wsj-says?srnd=phx-ai

Wang Clip: https://x.com/tsarnick/status/1867700453494206883

Amodei Clip: https://x.com/ai_ctrl/status/1884951111771001188

https://simple-bench.com/



03 Feb 2025Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research00:18:32

12 hours ago Deep Research was unveiled, and I’ve tested it thoroughly, including vs Deepseek R1 with search, Gemini Deep Research and even R1 in Perplexity. It’s a notable step forward, with one big caveat. I’ll go through all the benchmark figures, my initial impression of the o3 model within, and much more.

Deep Research:
https://openai.com/index/introducing-deep-research/

https://www.youtube.com/watch?v=YkCDVn3_wiw


GAIA Bench: https://openreview.net/forum?id=fibxvahvs3

https://openreview.net/pdf?id=fibxvahvs3

CodeELO:https://arxiv.org/pdf/2501.01257

CamelCamel:https://uk.camelcamelcamel.com/

Deepseek R1 with search: https://chat.deepseek.com/

https://arxiv.org/pdf/2501.12948

HaluBench: https://arxiv.org/pdf/2407.08488


Chapters:

00:00 - Introduction

01:06 - Powered by o3, Humanity’s Last Exam, GAIA

03:55 - Simple Tests 

06:00 - Good News vs Deepseek R1 and Gemini Deep Research

09:32 - Bad News on Hallucinations 

14:14 - What Can’t it Browse?

14:42 - For Shopping?

16:40 - Final thoughts



11 Feb 2025AGI: (gets close), Humans: ‘Who Gets the Money?’00:22:17

A 'frontier reasoning model' from just 1000 examples (s1). A $100B Musk bid for power. Gemini 2, Rand and warning from Amodei. Here’s 7-8 developments you may have missed but which I would argue help us understand how the next few years will play out. From labour vs capital to automating rival companies and countries, and from non-profit shenanigans to new mini-docs, there was just too much for me not to make a vid.

GiveWell: https://www.givewell.org/charities/top-charities

AI Insiders ($9!): https://www.patreon.com/AIExplained

s1 Paper: https://arxiv.org/pdf/2501.19393
Musk Bid: https://www.wsj.com/tech/ai/musks-97-4-billion-openai-bid-piles-pressure-on-altman-f6749e6c?mod=hp_lead_pos1
Altman Reply: https://x.com/sama/status/1889059531625464090?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet
Google vs OpenAI: https://x.com/sama/status/1888703820596977684
RAND Study: https://www.rand.org/pubs/perspectives/PEA3691-4.html
Dev Meetup: https://x.com/btibor91/status/1888976302621040852
Altman $100 Trillion: https://www.nytimes.com/2023/03/31/technology/sam-altman-open-ai-chatgpt.html
Karpathy Vid: https://www.youtube.com/watch?v=7xTGNNLPyMI
Amodei Warning: https://www.anthropic.com/news/paris-ai-summit
Bengio Source: https://www.youtube.com/watch?v=6HDjVncL5Go

Chapters:
00:00 - Intro
01:37 -  AGI Inches Closer
04:26 - ‘Super-Exponential’
05:58 - Musk Bid
07:34 - Luxury Goods and Land
09:05 - ‘Benefits All Humanity’
12:52 - ‘National Security’
14:21 - s1
20:33 - Final thoughts


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

25 Feb 2025Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon)00:27:39

Claude 3.7 is here, hot on the heels of Grok 3 and a host of other developments, but how good is it really? And what does it say about the next few months in AI? I’ve read the papers, played with the model for hours, and benched it on Simple. Things aren’t slowing down. Plus the latest in humanoid robots, led by Helix and freaked out by Protoclone. And reports of GPT 4.5 and DeepSeek R2.


GraySwan Competition! https://app.grayswan.ai/arena/challenge/agent-red-teaming

https://x.com/GraySwanAI/status/1894084923260043282


Chapters:

00:00 - Introduction

01:25 - Claude 3.7 New Stats/Demos 

05:22 - 128k Output

06:13 - Pokemon

06:58 - Just a tool? 

09:54 - DeepSeek R2

10:20 - Claude 3.7 System Card/Paper Highlights 

17:18 - Simple Record Score/Competition

20:37 - Grok 3 + Redteaming prizes

22:26 - Google Co-scientist

24:02 - Humanoid Robot Developments


3.7 Release Notes: https://www.anthropic.com/news/claude-3-7-sonnet

vs o3 and Grok 3: https://x.com/12exyz/status/1891723056931827959

Extended Thinking: https://www.anthropic.com/research/visible-extended-thinking?s=09

System Prompt: https://docs.anthropic.com/en/release-notes/system-prompts#feb-24th-2025

System Card: https://assets.anthropic.com/m/785e231869ea8b3b/original/claude-3-7-sonnet-system-card.pdf

Unfaithful CoT: https://arxiv.org/pdf/2305.04388

Original Constitution: https://www.anthropic.com/news/claudes-constitution

Responsible Scaling Policy: https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdf

Amodei and Hassabis:https://www.youtube.com/watch?v=4poqjZlM8Lo

https://simple-bench.com/

400 Weekly Users: https://x.com/bradlightcap/status/1892579908179882057

Grok 3 Jailbroken: https://x.com/LinusEkenstam/status/1893832876581380280

Google Co-Scientist: https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/

But Hassabis Says Years Away: https://www.youtube.com/watch?v=yr0GiSgUvPU&t=156s

DeepSeek R2 Reuters: https://www.reuters.com/technology/artificial-intelligence/deepseek-rushes-launch-new-ai-model-china-goes-all-2025-02-25/

Protoclone: https://www.reddit.com/r/interestingasfuck/comments/1it9rpp/protoclone_the_worlds_first_bipedal/

Helix: https://www.figure.ai/news/helix

TechTrance: https://www.youtube.com/@TheTechTrance/videos

GPT 4.5 Soon:

28 Feb 2025GPT 4.5 - not so much wow00:25:05

GPT 4.5 is here, and do you remember when AI lab CEOs like Sam Altman and Dario Amodei were betting everything on scaling up base models like this one? Well let’s find out what would have happened if the future of AI rested on models like GPT 4.5. You’ll see all the benchmarks, highlights of the paper, emotional intelligence and humor tests, Simple Bench results (reddit was an unreliable source), and why it’s not all bad news for OpenAI.

https://www.emergentmind.com/

AI Insiders (now $9!): https://www.patreon.com/AIExplained

Chapters
00:00 - Introduction
01:04 - Details and Benchmarks
03:04 - Emotional intelligence? 
08:37 - Creative writing?
11:40 - Visual reasoning and Pricing
12:41 - Simple Performance
16:01 - End of Pretraining Scaling?
17:03 - CEO Hype
18:11 - System Card Highlights
23:32 - Karpathy Reaction

GPT 4.5 System card: https://cdn.openai.com/gpt-4-5-system-card-2272025.pdf
Release Notes: https://openai.com/index/gpt-4-5-system-card/
Altman Hype: https://x.com/sama/status/1891533802779910471
Details: https://openai.com/index/introducing-gpt-4-5/ https://x.com/OpenAI/status/1895219596317335792
End of an Era: https://x.com/wgussml/status/1895187231666774377
Anthropic Original Claim: https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-to-take-on-openai/
Smell: https://x.com/rapha_gl/status/1895213014699385082
Bob McGrew: https://x.com/bobmcgrewai/status/1895228291981943265
Deep Research System Card: https://cdn.openai.com/deep-research-system-card.pdf
Reddit: https://www.reddit.com/r/singularity/comments/1izu1t7/gpt45_crushes_simple_bench/
API Pricing: https://openai.com/api/pricing/
LiveStream: https://www.youtube.com/watch?v=cfRYp0nItZ8&t=1s
https://simple-bench.com/


Karpathy Comparison: https://x.com/karpathy/status/1895213020982472863
https://x.com/karpathy/status/1895337579589079434


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

13 Mar 2025Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)00:12:58

Is Manus AI the memecoin of the AI world, or legit? I’ll compare it to OpenAI’s Deep Research, Operator, Grok 3 DeepSearch and more to find out. I’ll also let you in on some of the secrets of what makes a good hype campaign, the estimated costs of Manus AI, and where it is strong. Other news (yes, Gemini image editing and research hacking, I mean you), will have to wait for a few more hours, as millions enquire about Manus AI.

https://app.grayswan.ai/arena

AI Insiders ($9!): https://www.patreon.com/AIExplained
Patreon Vid: https://www.patreon.com/posts/4-ai-trends-in-123857767

Chapters:
00:00 - Introduction
00:46 - Hype Campaign
02:40 - Single, Public Benchmark 
03:12 - What is Manus AI?
04:22 - Test 1
05:12 - Cost and Rate Limits
06:15 - Test 2 vs Deep Research + Grok 3 DeepSearch
08:24 - Test 3 (not AGI)
11:10 - 4 Trends in AI in 2025
11:37 - Hype Works

Manus AI: https://manus.im/app

Xiao Hong Interview: https://www.chinatalk.media/p/manus-chinas-latest-ai-sensation

Gaia Benchmark: https://openreview.net/pdf?id=fibxvahvs3
MIT Report: https://www.technologyreview.com/2025/03/11/1113133/manus-ai-review/

Information Report: https://www.theinformation.com/articles/anthropics-claude-drives-strong-revenue-growth-while-powering-manus-sensation?rc=sy0ihq

Hype Examples: https://x.com/Saboo_Shubham_/status/1898425707401031940
https://x.com/EHuanglu/status/1899110687902978373
https://x.com/AJs_AI/status/1898756132384178291

Mistakes: https://x.com/TheXeophon/status/1898737178273829220

Tools and Code: https://x.com/peakji/status/1898994802194346408

https://operator.chatgpt.com/




Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

25 Mar 2025Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI00:13:47

Gemini 2.5 is out, on the same day as the new DeepSeek V3 (which should power Deepseek R2). Do both models prove AI is being commoditized? Let’s find out, on this blockbuster day of AI releases. Plus exclusives from the Information, Simple indications, Vista Bench, LM Arena and more…

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters: 
00:00 - Introduction
01:15 - Gemini 2.5 Benchmarks
05:46 - Long Context, Simple indication
07:08 - New Deepseek V3 -024
09:11 - Microsoft MAI
11:48 - 90% of code but new Claude jobs

‘World’s most powerful model’: https://x.com/OfficialLoganK/status/1904580368432586975

Gemini 2.5 Release Notes: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#gemini-2-5-thinking

‘Commoditized’: https://the-decoder.com/microsoft-ceo-satya-nadella-says-ai-models-are-getting-commoditized/

Microsoft Information report: https://www.theinformation.com/articles/microsofts-ai-guru-wants-independence-from-openai-thats-easier-said-than-done?rc=sy0ihq

LMarena: https://x.com/lmarena_ai/status/1904581128746656099/photo/1

Free for now: https://x.com/btibor91/status/1904578053537476628

Vista Bench:https://scale.com/leaderboard/visual_language_understanding

DeepSeek V3: https://huggingface.co/deepseek-ai/DeepSeek-V3-0324

Claude Plays Pokemon: https://www.twitch.tv/claudeplayspokemon
Amodei: 100% Coding: https://www.youtube.com/watch?v=esCSpbDPJik&t=3017s

Anthropic Jobs: https://job-boards.greenhouse.io/anthropic/jobs/4020717008

Microsoft Money from Onslaught: https://www.972mag.com/microsoft-azure-openai-israeli-army-cloud/

https://simple-bench.com/

Release Date Comments: https://x.com/zacharynado/status/1904647277861318979


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

28 Mar 2025Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)00:21:21

Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ …

https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained

… and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance.


AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:36 - Fiction Bench
02:41 - Practicality - YouTube urls + Security - cut-off date
03:42 - Coding 
06:22 - WeirdML Bench
07:01 - Simple Bench Record High 
11:23 - Reverse Engineering!
13:22 - Anthropic Paper
17:49 - 3 Caveats

Gemini 2.5 Updated: https://deepmind.google/technologies/gemini/

Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87

https://simple-bench.com/

WeirdML: https://htihle.github.io/weirdml.html
https://x.com/htihle/status/1905014058228625542

Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-model
https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cot

https://aistudio.google.com/prompts/new_chat

Search Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php

Live bench: https://livebench.ai/#/
Paper: https://arxiv.org/pdf/2406.19314

LiveCode Bench: https://livecodebench.github.io/

SWE-Verified: https://arxiv.org/pdf/2310.06770


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

07 Apr 2025AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax +‘Superintelligence in 2027’...00:23:51

The latest on Llama 4, and whether it signals a slowdown in AI, or solid progress. Plus, a deep dive on that viral prediction of superintelligence by 2027, and Amodei’s cautionary words on what could stop AI progress in its tracks. o3 news, and more, as well.

Weights & Biases: https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained


DeepSeek Doc: https://www.patreon.com/posts/openai-is-not-r1-125869969

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:47 - Stock Crash 
02:28 - Llama 4
10:55 - o3 News
11:59 - OpenAI non-profit?
13:13 - AI 2027

Llama 4 Release: https://ai.meta.com/blog/llama-4-multimodal-intelligence/

Dario Amodei Comments: https://www.youtube.com/watch?v=esCSpbDPJik

Knowledge Cut-off: https://www.llama.com/docs/model-cards-and-prompt-formats/llama4_omni/

Aider Polyglot: https://aider.chat/docs/leaderboards/

Gemini 1.5: https://arxiv.org/pdf/2403.05530

Fiction-LiveBench: https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/oQdzQvKHw8JyXbN87

OpenAI Valuation: https://www.nytimes.com/2025/03/31/technology/openai-valuation-300-billion.html?login=smartlock&auth=login-smartlock

OpenAI Cybersecurity: https://www.bloomberg.com/news/articles/2024-01-16/openai-working-with-us-military-on-cybersecurity-tools-for-veterans

Deep research System Card: https://cdn.openai.com/deep-research-system-card.pdf

https://openai.com/index/paperbench/

AI 2027: https://ai-2027.com/

METR Paper: https://arxiv.org/pdf/2503.14499

OpenAI non-profit: https://openai.com/index/nonprofit-commission-guidance/

NYT Piece: https://www.nytimes.com/2025/04/03/technology/ai-futures-project-ai-2027.html?unlocked_article_code=1.804._yKi.QhwOp15Q3tcU&smid=url-share&s=09

Kokotajlo predictions 2021: https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like

https://simple-bench.com/


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

16 Apr 2025‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2: 7 Developments Critically Analysed00:20:09

This pod won’t just be about the release of GPT 4.1 in the last 48 hours, o3 build-up, Kling 2.0, a sneak-peak at the next OpenAI model, or even the new Dolphin language tool. It will be about 7 such stories that contextualise where we are in AI and what is happening.

https://www.emergentmind.com/


Chapters: 

00:00 - Introduction

00:30 - Kling 2.0

01:35 - GPT 4.1

05:25 - o3 Build-up

07:37 - ‘Product Company’

09:31 - Safe Superintelligence

10:54 - DolphinGemma

13:16 - Data Dominance?


Kling 2.0: https://app.klingai.com/global/release-notes


Dolphin Gemma: https://blog.google/technology/ai/dolphingemma/?s=09


https://openai.com/index/gpt-4-1/


OpenAI o3 Build-up The Information: https://www.theinformation.com/articles/openais-latest-breakthrough-ai-comes-new-ideas?rc=sy0ihq


Physical reasoning: https://x.com/a_karvonen/status/1911839968990814503


Fiction Live.bench: https://x.com/ficlive/status/1911853409847906626


Altman Ted: https://www.youtube.com/watch?v=5MWT_doo68k


https://simple-bench.com/try-yourself


https://aider.chat/docs/leaderboards/


4.5: https://www.youtube.com/watch?v=6nJZopACRuQ


Geospatial reasoning: https://research.google/blog/geospatial-reasoning-unlocking-insights-with-generative-ai-and-multiple-foundation-models/


Pioneers: https://x.com/OpenAIDevs/status/1910017976256119151

Evals: https://www.youtube.com/watch?v=scsW6_2SPC4

Anthropic Updates: https://www.bloomberg.com/news/articles/2025-04-15/anthropic-is-readying-a-voice-assistant-feature-to-rival-openai?srnd=phx-ai

https://x.com/sethsaler/status/1912188383457059301


https://techcrunch.com/2025/04/12/openai-co-founder-ilya-sutskevers-safe-superintelligence-reportedly-valued-at-32b/

https://ai.meta.com/blog/llama-4-multimodal-intelligence/

https://deepmind.google/technologies/gemini/pro/

https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/

https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/

OpenAI Documentary: https://www.patreon.com/posts/one-machine-to-121940490

16 Apr 2025o3 and o4-mini - they’re great, but easy to over-hype00:14:24

Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, and my own tests, but some you may not have seen before. Yes, they can whip up amazing front-end in a few seconds, but you always have to ask what is in their data. Either way, they prove the gains from RL are just beginning…

https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained

AI Insiders ($9!): https://www.patreon.com/AIExplained


Chapters:
00:00 - o3 and o4-mini


https://simple-bench.com/

Plus, Teams and Pro,  plus token count: https://x.com/btibor91/status/1912568994512662679

System Card: https://openai.com/index/o3-o4-mini-system-card/

Release Notes: https://openai.com/index/introducing-o3-and-o4-mini/

https://deepmind.google/technologies/gemini/pro/

https://x.com/DeryaTR_/status/1912558350794961168

https://x.com/polynoamial/status/1912564068168450396

API Pricing:https://openai.com/api/pricing/

https://aider.chat/docs/leaderboards/


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Enhance your understanding of AI Explained Official Podcast with My Podcast Data

At My Podcast Data, we strive to provide in-depth, data-driven insights into the world of podcasts. Whether you're an avid listener, a podcast creator, or a researcher, the detailed statistics and analyses we offer can help you better understand the performance and trends of AI Explained Official Podcast. From episode frequency and shared links to RSS feed health, our goal is to empower you with the knowledge you need to stay informed and make the most of your podcasting experience. Explore more shows and discover the data that drives the podcast industry.
© My Podcast Data