
Machine Learning Street Talk (MLST) (Machine Learning Street Talk (MLST))
Explorez tous les épisodes de Machine Learning Street Talk (MLST)
Date | Titre | Durée | |
---|---|---|---|
22 Feb 2022 | #063 - Prof. YOSHUA BENGIO - GFlowNets, Consciousness & Causality | 01:33:07 | |
We are now sponsored by Weights and Biases! Please visit our sponsor link: http://wandb.me/MLST Patreon: https://www.patreon.com/mlst For Yoshua Bengio, GFlowNets are the most exciting thing on the horizon of Machine Learning today. He believes they can solve previously intractable problems and hold the key to unlocking machine abstract reasoning itself. This discussion explores the promise of GFlowNets and the personal journey Prof. Bengio traveled to reach them. Panel: Dr. Tim Scarfe Dr. Keith Duggar Dr. Yannic Kilcher Our special thanks to: - Alexander Mattick (Zickzack) References: Yoshua Bengio @ MILA (https://mila.quebec/en/person/bengio-yoshua/) GFlowNet Foundations (https://arxiv.org/pdf/2111.09266.pdf) Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation (https://arxiv.org/pdf/2106.04399.pdf) Interpolation Consistency Training for Semi-Supervised Learning (https://arxiv.org/pdf/1903.03825.pdf) Towards Causal Representation Learning (https://arxiv.org/pdf/2102.11107.pdf) Causal inference using invariant prediction: identification and confidence intervals (https://arxiv.org/pdf/1501.01332.pdf) | |||
08 Jul 2020 | Robert Lange on NN Pruning and Collective Intelligence | 01:45:41 | |
We speak with Robert Lange! Robert is a PhD student at the Technical University Berlin. His research combines Deep Multi-Agent Reinforcement Learning and Cognitive Science to study the learning dynamics of large collectives. He has a brilliant blog where he distils and explains cutting edge ML research. We spoke about his story, economics, multi-agent RL, intelligence and AGI, and his recent article summarising the state of the art in neural network pruning. Robert's article on pruning in NNs https://roberttlange.github.io/posts/2020/06/lottery-ticket-hypothesis/ 00:00:00 Intro 00:04:17 Show start and intro to Robert 00:11:39 Economics background 00:27:20 Intrinsic motivation 00:33:22 Intelligence/consciousness 00:48:16 Lottery ticket/pruning article discussion 01:43:21 Robert's advice for younger self and state of deep learning Robert's LinkedIn: https://www.linkedin.com/in/robert-tjarko-lange-19539a12a/ @RobertTLange #machinelearning #deeplearning | |||
05 Nov 2023 | MULTI AGENT LEARNING - LANCELOT DA COSTA | 00:49:56 | |
Please support us https://www.patreon.com/mlst https://discord.gg/aNPkGUQtc5 https://twitter.com/MLStreetTalk Lance Da Costa aims to advance our understanding of intelligent systems by modelling cognitive systems and improving artificial systems. He's a PhD candidate with Greg Pavliotis and Karl Friston jointly at Imperial College London and UCL, and a student in the Mathematics of Random Systems CDT run by Imperial College London and the University of Oxford. He completed an MRes in Brain Sciences at UCL with Karl Friston and Biswa Sengupta, an MASt in Pure Mathematics at the University of Cambridge with Oscar Randal-Williams, and a BSc in Mathematics at EPFL and the University of Toronto. Summary: Lance did pure math originally but became interested in the brain and AI. He started working with Karl Friston on the free energy principle, which claims all intelligent agents minimize free energy for perception, action, and decision-making. Lance has worked to provide mathematical foundations and proofs for why the free energy principle is true, starting from basic assumptions about agents interacting with their environment. This aims to justify the principle from first physics principles. Dr. Scarfe and Da Costa discuss different approaches to AI - the free energy/active inference approach focused on mimicking human intelligence vs approaches focused on maximizing capability like deep reinforcement learning. Lance argues active inference provides advantages for explainability and safety compared to black box AI systems. It provides a simple, sparse description of intelligence based on a generative model and free energy minimization. They discuss the need for structured learning and acquiring core knowledge to achieve more human-like intelligence. Lance highlights work from Josh Tenenbaum's lab that shows similar learning trajectories to humans in a simple Atari-like environment. Incorporating core knowledge constraints the space of possible generative models the agent can use to represent the world, making learning more sample efficient. Lance argues active inference agents with core knowledge can match human learning capabilities. They discuss how to make generative models interpretable, such as through factor graphs. The goal is to be able to understand the representations and message passing in the model that leads to decisions. In summary, Lance argues active inference provides a principled approach to AI with advantages for explainability, safety, and human-like learning. Combining it with core knowledge and structural learning aims to achieve more human-like artificial intelligence. https://www.lancelotdacosta.com/ https://twitter.com/lancelotdacosta Interviewer: Dr. Tim Scarfe TOC 00:00:00 - Start 00:09:27 - Intelligence 00:12:37 - Priors / structure learning 00:17:21 - Core knowledge 00:29:05 - Intelligence is specialised 00:33:21 - The magic of agents 00:39:30 - Intelligibility of structure learning #artificialintelligence #activeinference | |||
11 Feb 2021 | #042 - Pedro Domingos - Ethics and Cancel Culture | 01:33:59 | |
Today we have professor Pedro Domingos and we are going to talk about activism in machine learning, cancel culture, AI ethics and kernels. In Pedro's book the master algorithm, he segmented the AI community into 5 distinct tribes with 5 unique identities (and before you ask, no the irony of an anti-identitarian doing do was not lost on us!). Pedro recently published an article in Quillette called Beating Back Cancel Culture: A Case Study from the Field of Artificial Intelligence. Domingos has railed against political activism in the machine learning community and cancel culture. Recently Pedro was involved in a controversy where he asserted the NeurIPS broader impact statements are an ideological filter mechanism. Important Disclaimer: All views expressed are personal opinions. 00:00:00 Caveating 00:04:08 Main intro 00:07:44 Cancelling culture is a culture and intellectual weakness 00:12:26 Is cancel culture a post-modern religion? 00:24:46 Should we have gateways and gatekeepers? 00:29:30 Does everything require broader impact statements? 00:33:55 We are stifling diversity (of thought) not promoting it. 00:39:09 What is fair and how to do fair? 00:45:11 Models can introduce biases by compressing away minority data 00:48:36 Accurate but unequal soap dispensers 00:53:55 Agendas are not even self-consistent 00:56:42 Is vs Ought: all variables should be used for Is 01:00:38 Fighting back cancellation with cancellation? 01:10:01 Intent and degree matter in right vs wrong. 01:11:08 Limiting principles matter 01:15:10 Gradient descent and kernels 01:20:16 Training Journey matter more than Destination 01:24:36 Can training paths teach us about symmetry? 01:28:37 What is the most promising path to AGI? 01:31:29 Intelligence will lose its mystery | |||
08 Feb 2025 | Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero | 01:18:10 | |
Professor Randall Balestriero joins us to discuss neural network geometry, spline theory, and emerging phenomena in deep learning, based on research presented at ICML. Topics include the delayed emergence of adversarial robustness in neural networks ("grokking"), geometric interpretations of neural networks via spline theory, and challenges in reconstruction learning. We also cover geometric analysis of Large Language Models (LLMs) for toxicity detection and the relationship between intrinsic dimensionality and model control in RLHF. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** Randall Balestriero https://x.com/randall_balestr https://randallbalestriero.github.io/ Show notes and transcript: https://www.dropbox.com/scl/fi/3lufge4upq5gy0ug75j4a/RANDALLSHOW.pdf?rlkey=nbemgpa0jhawt1e86rx7372e4&dl=0 TOC: - Introduction - 00:00:00: Introduction - Neural Network Geometry and Spline Theory - 00:01:41: Neural Network Geometry and Spline Theory - 00:07:41: Deep Networks Always Grok - 00:11:39: Grokking and Adversarial Robustness - 00:16:09: Double Descent and Catastrophic Forgetting - Reconstruction Learning - 00:18:49: Reconstruction Learning - 00:24:15: Frequency Bias in Neural Networks - Geometric Analysis of Neural Networks - 00:29:02: Geometric Analysis of Neural Networks - 00:34:41: Adversarial Examples and Region Concentration - LLM Safety and Geometric Analysis - 00:40:05: LLM Safety and Geometric Analysis - 00:46:11: Toxicity Detection in LLMs - 00:52:24: Intrinsic Dimensionality and Model Control - 00:58:07: RLHF and High-Dimensional Spaces - Conclusion - 01:02:13: Neural Tangent Kernel - 01:08:07: Conclusion REFS: [00:01:35] Humayun – Deep network geometry & input space partitioning https://arxiv.org/html/2408.04809v1 [00:03:55] Balestriero & Paris – Linking deep networks to adaptive spline operators https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf [00:13:55] Song et al. – Gradient-based white-box adversarial attacks https://arxiv.org/abs/2012.14965 [00:16:05] Humayun, Balestriero & Baraniuk – Grokking phenomenon & emergent robustness https://arxiv.org/abs/2402.15555 [00:18:25] Humayun – Training dynamics & double descent via linear region evolution https://arxiv.org/abs/2310.12977 [00:20:15] Balestriero – Power diagram partitions in DNN decision boundaries https://arxiv.org/abs/1905.08443 [00:23:00] Frankle & Carbin – Lottery Ticket Hypothesis for network pruning https://arxiv.org/abs/1803.03635 [00:24:00] Belkin et al. – Double descent phenomenon in modern ML https://arxiv.org/abs/1812.11118 [00:25:55] Balestriero et al. – Batch normalization’s regularization effects https://arxiv.org/pdf/2209.14778 [00:29:35] EU – EU AI Act 2024 with compute restrictions https://www.lw.com/admin/upload/SiteAttachments/EU-AI-Act-Navigating-a-Brave-New-World.pdf [00:39:30] Humayun, Balestriero & Baraniuk – SplineCam: Visualizing deep network geometry https://openaccess.thecvf.com/content/CVPR2023/papers/Humayun_SplineCam_Exact_Visualization_and_Characterization_of_Deep_Network_Geometry_and_CVPR_2023_paper.pdf [00:40:40] Carlini – Trade-offs between adversarial robustness and accuracy https://arxiv.org/pdf/2407.20099 [00:44:55] Balestriero & LeCun – Limitations of reconstruction-based learning methods https://openreview.net/forum?id=ez7w0Ss4g9 (truncated, see shownotes PDF) | |||
22 Oct 2024 | Dr. Sanjeev Namjoshi - Active Inference | 02:45:32 | |
Dr. Sanjeev Namjoshi, a machine learning engineer who recently submitted a book on Active Inference to MIT Press, discusses the theoretical foundations and practical applications of Active Inference, the Free Energy Principle (FEP), and Bayesian mechanics. He explains how these frameworks describe how biological and artificial systems maintain stability by minimizing uncertainty about their environment. DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)? MLST is sponsored by Tufa Labs: Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more. Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2. Interested? Apply for an ML research position: benjamin@tufa.ai Namjoshi traces the evolution of these fields from early 2000s neuroscience research to current developments, highlighting how Active Inference provides a unified framework for perception and action through variational free energy minimization. He contrasts this with traditional machine learning approaches, emphasizing Active Inference's natural capacity for exploration and curiosity through epistemic value. He sees Active Inference as being at a similar stage to deep learning in the early 2000s - poised for significant breakthroughs but requiring better tools and wider adoption. While acknowledging current computational challenges, he emphasizes Active Inference's potential advantages over reinforcement learning, particularly its principled approach to exploration and planning. Dr. Sanjeev Namjoshi https://snamjoshi.github.io/ TOC: 1. Theoretical Foundations: AI Agency and Sentience [00:00:00] 1.1 Intro [00:02:45] 1.2 Free Energy Principle and Active Inference Theory [00:11:16] 1.3 Emergence and Self-Organization in Complex Systems [00:19:11] 1.4 Agency and Representation in AI Systems [00:29:59] 1.5 Bayesian Mechanics and Systems Modeling 2. Technical Framework: Active Inference and Free Energy [00:38:37] 2.1 Generative Processes and Agent-Environment Modeling [00:42:27] 2.2 Markov Blankets and System Boundaries [00:44:30] 2.3 Bayesian Inference and Prior Distributions [00:52:41] 2.4 Variational Free Energy Minimization Framework [00:55:07] 2.5 VFE Optimization Techniques: Generalized Filtering vs DEM 3. Implementation and Optimization Methods [00:58:25] 3.1 Information Theory and Free Energy Concepts [01:05:25] 3.2 Surprise Minimization and Action in Active Inference [01:15:58] 3.3 Evolution of Active Inference Models: Continuous to Discrete Approaches [01:26:00] 3.4 Uncertainty Reduction and Control Systems in Active Inference 4. Safety and Regulatory Frameworks [01:32:40] 4.1 Historical Evolution of Risk Management and Predictive Systems [01:36:12] 4.2 Agency and Reality: Philosophical Perspectives on Models [01:39:20] 4.3 Limitations of Symbolic AI and Current System Design [01:46:40] 4.4 AI Safety Regulation and Corporate Governance 5. Socioeconomic Integration and Modeling [01:52:55] 5.1 Economic Policy and Public Sentiment Modeling [01:55:21] 5.2 Free Energy Principle: Libertarian vs Collectivist Perspectives [01:58:53] 5.3 Regulation of Complex Socio-Technical Systems [02:03:04] 5.4 Evolution and Current State of Active Inference Research 6. Future Directions and Applications [02:14:26] 6.1 Active Inference Applications and Future Development [02:22:58] 6.2 Cultural Learning and Active Inference [02:29:19] 6.3 Hierarchical Relationship Between FEP, Active Inference, and Bayesian Mechanics [02:33:22] 6.4 Historical Evolution of Free Energy Principle [02:38:52] 6.5 Active Inference vs Traditional Machine Learning Approaches Transcript and shownotes with refs and URLs: https://www.dropbox.com/scl/fi/qj22a660cob1795ej0gbw/SanjeevShow.pdf?rlkey=w323r3e8zfsnve22caayzb17k&st=el1fdgfr&dl=0 | |||
20 Jan 2021 | #038 - Professor Kenneth Stanley - Why Greatness Cannot Be Planned | 02:46:26 | |
Professor Kenneth Stanley is currently a research science manager at OpenAI in San Fransisco. We've Been dreaming about getting Kenneth on the show since the very begininning of Machine Learning Street Talk. Some of you might recall that our first ever show was on the enhanced POET paper, of course Kenneth had his hands all over it. He's been cited over 16000 times, his most popular paper with over 3K citations was the NEAT algorithm. His interests are neuroevolution, open-endedness, NNs, artificial life, and AI. He invented the concept of novelty search with no clearly defined objective. His key idea is that there is a tyranny of objectives prevailing in every aspect of our lives, society and indeed our algorithms. Crucially, these objectives produce convergent behaviour and thinking and distract us from discovering stepping stones which will lead to greatness. He thinks that this monotonic objective obsession, this idea that we need to continue to improve benchmarks every year is dangerous. He wrote about this in detail in his recent book "greatness can not be planned" which will be the main topic of discussion in the show. We also cover his ideas on open endedness in machine learning. 00:00:00 Intro to Kenneth 00:01:16 Show structure disclaimer 00:04:16 Passionate discussion 00:06:26 WHy greatness cant be planned and the tyranny of objectives 00:14:40 Chinese Finger Trap 00:16:28 Perverse Incentives and feedback loops 00:18:17 Deception 00:23:29 Maze example 00:24:44 How can we define curiosity or interestingness 00:26:59 Open endedness 00:33:01 ICML 2019 and Yannic, POET, first MSLST 00:36:17 evolutionary algorithms++ 00:43:18 POET, the first MLST 00:45:39 A lesson to GOFAI people 00:48:46 Machine Learning -- the great stagnation 00:54:34 Actual scientific successes are usually luck, and against the odds -- Biontech 00:56:21 Picbreeder and NEAT 01:10:47 How Tim applies these ideas to his life and why he runs MLST 01:14:58 Keith Skit about UCF 01:15:13 Main show kick off 01:18:02 Why does Kenneth value serindipitous exploration so much 01:24:10 Scientific support for Keneths ideas in normal life 01:27:12 We should drop objectives to achieve them. An oxymoron? 01:33:13 Isnt this just resource allocation between exploration and exploitation? 01:39:06 Are objectives merely a matter of degree? 01:42:38 How do we allocate funds for treasure hunting in society 01:47:34 A keen nose for what is interesting, and voting can be dangerous 01:53:00 Committees are the antithesis of innovation 01:56:21 Does Kenneth apply these ideas to his real life? 01:59:48 Divergence vs interestingness vs novelty vs complexity 02:08:13 Picbreeder 02:12:39 Isnt everything novel in some sense? 02:16:35 Imagine if there was no selection pressure? 02:18:31 Is innovation == environment exploitation? 02:20:37 Is it possible to take shortcuts if you already knew what the innovations were? 02:21:11 Go Explore -- does the algorithm encode the stepping stones? 02:24:41 What does it mean for things to be interestingly different? 02:26:11 behavioral characterization / diversity measure to your broad interests 02:30:54 Shaping objectives 02:32:49 Why do all ambitious objectives have deception? Picbreeder analogy 02:35:59 Exploration vs Exploitation, Science vs Engineering 02:43:18 Schools of thought in ML and could search lead to AGI 02:45:49 Official ending | |||
08 Nov 2020 | #029 GPT-3, Prompt Engineering, Trading, AI Alignment, Intelligence | 01:50:32 | |
This week Dr. Tim Scarfe, Dr. Keith Duggar, Yannic Kilcher and Connor Leahy cover a broad range of topics, ranging from academia, GPT-3 and whether prompt engineering could be the next in-demand skill, markets and economics including trading and whether you can predict the stock market, AI alignment, utilitarian philosophy, randomness and intelligence and even whether the universe is infinite! 00:00:00 Show Introduction 00:12:49 Academia and doing a Ph.D 00:15:49 From academia to wall street 00:17:08 Quants -- smoke and mirrors? Tail Risk 00:19:46 Previous results dont indicate future success in markets 00:23:23 Making money from social media signals? 00:24:41 Predicting the stock market 00:27:20 Things which are and are not predictable 00:31:40 Tim postscript comment on predicting markets 00:32:37 Connor take on markets 00:35:16 As market become more efficient.. 00:36:38 Snake oil in ML 00:39:20 GPT-3, we have changed our minds 00:52:34 Prompt engineering a new form of software development? 01:06:07 GPT-3 and prompt engineering 01:12:33 Emergent intelligence with increasingly weird abstractions 01:27:29 Wireheading and the economy 01:28:54 Free markets, dragon story and price vs value 01:33:59 Utilitarian philosophy and what does good look like? 01:41:39 Randomness and intelligence 01:44:55 Different schools of thought in ML 01:46:09 Is the universe infinite? Thanks a lot for Connor Leahy for being a guest on today's show. https://twitter.com/NPCollapse -- you can join his EleutherAI community discord here: https://discord.com/invite/vtRgjbM | |||
17 Nov 2024 | Nora Belrose - AI Development, Safety, and Meaning | 02:29:50 | |
Nora Belrose, Head of Interpretability Research at EleutherAI, discusses critical challenges in AI safety and development. The conversation begins with her technical work on concept erasure in neural networks through LEACE (LEAst-squares Concept Erasure), while highlighting how neural networks' progression from simple to complex learning patterns could have important implications for AI safety. Many fear that advanced AI will pose an existential threat -- pursuing its own dangerous goals once it's powerful enough. But Belrose challenges this popular doomsday scenario with a fascinating breakdown of why it doesn't add up. Belrose also provides a detailed critique of current AI alignment approaches, particularly examining "counting arguments" and their limitations when applied to AI safety. She argues that the Principle of Indifference may be insufficient for addressing existential risks from advanced AI systems. The discussion explores how emergent properties in complex AI systems could lead to unpredictable and potentially dangerous behaviors that simple reductionist approaches fail to capture. The conversation concludes by exploring broader philosophical territory, where Belrose discusses her growing interest in Buddhism's potential relevance to a post-automation future. She connects concepts of moral anti-realism with Buddhist ideas about emptiness and non-attachment, suggesting these frameworks might help humans find meaning in a world where AI handles most practical tasks. Rather than viewing this automated future with alarm, she proposes that Zen Buddhism's emphasis on spontaneity and presence might complement a society freed from traditional labor. SPONSOR MESSAGES: CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on ARC and AGI, they just acquired MindsAI - the current winners of the ARC challenge. Are you interested in working on ARC, or getting involved in their events? Goto https://tufalabs.ai/ Nora Belrose: https://norabelrose.com/ https://scholar.google.com/citations?user=p_oBc64AAAAJ&hl=en https://x.com/norabelrose SHOWNOTES: https://www.dropbox.com/scl/fi/38fhsv2zh8gnubtjaoq4a/NORA_FINAL.pdf?rlkey=0e5r8rd261821g1em4dgv0k70&st=t5c9ckfb&dl=0 TOC: 1. Neural Network Foundations [00:00:00] 1.1 Philosophical Foundations and Neural Network Simplicity Bias [00:02:20] 1.2 LEACE and Concept Erasure Fundamentals [00:13:16] 1.3 LISA Technical Implementation and Applications [00:18:50] 1.4 Practical Implementation Challenges and Data Requirements [00:22:13] 1.5 Performance Impact and Limitations of Concept Erasure 2. Machine Learning Theory [00:32:23] 2.1 Neural Network Learning Progression and Simplicity Bias [00:37:10] 2.2 Optimal Transport Theory and Image Statistics Manipulation [00:43:05] 2.3 Grokking Phenomena and Training Dynamics [00:44:50] 2.4 Texture vs Shape Bias in Computer Vision Models [00:45:15] 2.5 CNN Architecture and Shape Recognition Limitations 3. AI Systems and Value Learning [00:47:10] 3.1 Meaning, Value, and Consciousness in AI Systems [00:53:06] 3.2 Global Connectivity vs Local Culture Preservation [00:58:18] 3.3 AI Capabilities and Future Development Trajectory 4. Consciousness Theory [01:03:03] 4.1 4E Cognition and Extended Mind Theory [01:09:40] 4.2 Thompson's Views on Consciousness and Simulation [01:12:46] 4.3 Phenomenology and Consciousness Theory [01:15:43] 4.4 Critique of Illusionism and Embodied Experience [01:23:16] 4.5 AI Alignment and Counting Arguments Debate (TRUNCATED, TOC embedded in MP3 file with more information) | |||
18 Sep 2024 | Prof. Mark Solms - The Hidden Spring | 01:26:45 | |
Prof. Mark Solms, a neuroscientist and psychoanalyst, discusses his groundbreaking work on consciousness, challenging conventional cortex-centric views and emphasizing the role of brainstem structures in generating consciousness and affect. MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api. Key points discussed: The limitations of vision-centric approaches to consciousness studies. Evidence from decorticated animals and hydranencephalic children supporting the brainstem's role in consciousness. The relationship between homeostasis, the free energy principle, and consciousness. Critiques of behaviorism and modern theories of consciousness. The importance of subjective experience in understanding brain function. The discussion also explored broader topics: The potential impact of affect-based theories on AI development. The role of the SEEKING system in exploration and learning. Connections between neuroscience, psychoanalysis, and philosophy of mind. Challenges in studying consciousness and the limitations of current theories. Mark Solms: https://neuroscience.uct.ac.za/contacts/mark-solms Show notes and transcript: https://www.dropbox.com/scl/fo/roipwmnlfmwk2e7kivzms/ACjZF-VIGC2-Suo30KcwVV0?rlkey=53y8v2cajfcgrf17p1h7v3suz&st=z8vu81hn&dl=0 TOC (*) are best bits 00:00:00 1. Intro: Challenging vision-centric approaches to consciousness * 00:02:20 2. Evidence from decorticated animals and hydranencephalic children * 00:07:40 3. Emotional responses in hydranencephalic children 00:10:40 4. Brainstem stimulation and affective states 00:15:00 5. Brainstem's role in generating affective consciousness * 00:21:50 6. Dual-aspect monism and the mind-brain relationship 00:29:37 7. Information, affect, and the hard problem of consciousness * 00:37:25 8. Wheeler's participatory universe and Chalmers' theories 00:48:51 9. Homeostasis, free energy principle, and consciousness * 00:59:25 10. Affect, voluntary behavior, and decision-making 01:05:45 11. Psychoactive substances, REM sleep, and consciousness research 01:12:14 12. Critiquing behaviorism and modern consciousness theories * 01:24:25 13. The SEEKING system and exploration in neuroscience Refs: 1. Mark Solms' book "The Hidden Spring" [00:20:34] (MUST READ!) https://amzn.to/3XyETb3 2. Karl Friston's free energy principle [00:03:50] https://www.nature.com/articles/nrn2787 3. Hydranencephaly condition [00:07:10] https://en.wikipedia.org/wiki/Hydranencephaly 4. Periaqueductal gray (PAG) [00:08:57] https://en.wikipedia.org/wiki/Periaqueductal_gray 5. Positron Emission Tomography (PET) [00:13:52] https://en.wikipedia.org/wiki/Positron_emission_tomography 6. Paul MacLean's triune brain theory [00:03:30] https://en.wikipedia.org/wiki/Triune_brain 7. Baruch Spinoza's philosophy of mind [00:23:48] https://plato.stanford.edu/entries/spinoza-epistemology-mind 8. Claude Shannon's "A Mathematical Theory of Communication" [00:32:15] https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf 9. Francis Crick's "The Astonishing Hypothesis" [00:39:57] https://en.wikipedia.org/wiki/The_Astonishing_Hypothesis 10. Frank Jackson's Knowledge Argument [00:40:54] https://plato.stanford.edu/entries/qualia-knowledge/ 11. Mesolimbic dopamine system [01:11:51] https://en.wikipedia.org/wiki/Mesolimbic_pathway 12. Jaak Panksepp's SEEKING system [01:25:23] https://en.wikipedia.org/wiki/Jaak_Panksepp#Affective_neuroscience | |||
07 Dec 2024 | Neel Nanda - Mechanistic Interpretability (Sparse Autoencoders) | 03:42:36 | |
Neel Nanda, a senior research scientist at Google DeepMind, leads their mechanistic interpretability team. In this extensive interview, he discusses his work trying to understand how neural networks function internally. At just 25 years old, Nanda has quickly become a prominent voice in AI research after completing his pure mathematics degree at Cambridge in 2020. Nanda reckons that machine learning is unique because we create neural networks that can perform impressive tasks (like complex reasoning and software engineering) without understanding how they work internally. He compares this to having computer programs that can do things no human programmer knows how to write. His work focuses on "mechanistic interpretability" - attempting to uncover and understand the internal structures and algorithms that emerge within these networks. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on ARC and AGI, they just acquired MindsAI - the current winners of the ARC challenge. Are you interested in working on ARC, or getting involved in their events? Goto https://tufalabs.ai/ *** SHOWNOTES, TRANSCRIPT, ALL REFERENCES (DONT MISS!): https://www.dropbox.com/scl/fi/36dvtfl3v3p56hbi30im7/NeelShow.pdf?rlkey=pq8t7lyv2z60knlifyy17jdtx&st=kiutudhc&dl=0 We riff on: * How neural networks develop meaningful internal representations beyond simple pattern matching * The effectiveness of chain-of-thought prompting and why it improves model performance * The importance of hands-on coding over extensive paper reading for new researchers * His journey from Cambridge to working with Chris Olah at Anthropic and eventually Google DeepMind * The role of mechanistic interpretability in AI safety NEEL NANDA: https://www.neelnanda.io/ https://scholar.google.com/citations?user=GLnX3MkAAAAJ&hl=en https://x.com/NeelNanda5 Interviewer - Tim Scarfe TOC: 1. Part 1: Introduction [00:00:00] 1.1 Introduction and Core Concepts Overview 2. Part 2: Outside Interview [00:06:45] 2.1 Mechanistic Interpretability Foundations 3. Part 3: Main Interview [00:32:52] 3.1 Mechanistic Interpretability 4. Neural Architecture and Circuits [01:00:31] 4.1 Biological Evolution Parallels [01:04:03] 4.2 Universal Circuit Patterns and Induction Heads [01:11:07] 4.3 Entity Detection and Knowledge Boundaries [01:14:26] 4.4 Mechanistic Interpretability and Activation Patching 5. Model Behavior Analysis [01:30:00] 5.1 Golden Gate Claude Experiment and Feature Amplification [01:33:27] 5.2 Model Personas and RLHF Behavior Modification [01:36:28] 5.3 Steering Vectors and Linear Representations [01:40:00] 5.4 Hallucinations and Model Uncertainty 6. Sparse Autoencoder Architecture [01:44:54] 6.1 Architecture and Mathematical Foundations [02:22:03] 6.2 Core Challenges and Solutions [02:32:04] 6.3 Advanced Activation Functions and Top-k Implementations [02:34:41] 6.4 Research Applications in Transformer Circuit Analysis 7. Feature Learning and Scaling [02:48:02] 7.1 Autoencoder Feature Learning and Width Parameters [03:02:46] 7.2 Scaling Laws and Training Stability [03:11:00] 7.3 Feature Identification and Bias Correction [03:19:52] 7.4 Training Dynamics Analysis Methods 8. Engineering Implementation [03:23:48] 8.1 Scale and Infrastructure Requirements [03:25:20] 8.2 Computational Requirements and Storage [03:35:22] 8.3 Chain-of-Thought Reasoning Implementation [03:37:15] 8.4 Latent Structure Inference in Language Models | |||
11 Aug 2021 | #58 Dr. Ben Goertzel - Artificial General Intelligence | 02:28:14 | |
The field of Artificial Intelligence was founded in the mid 1950s with the aim of constructing “thinking machines” - that is to say, computer systems with human-like general intelligence. Think of humanoid robots that not only look but act and think with intelligence equal to and ultimately greater than that of human beings. But in the intervening years, the field has drifted far from its ambitious old-fashioned roots. Dr. Ben Goertzel is an artificial intelligence researcher, CEO and founder of SingularityNET. A project combining artificial intelligence and blockchain to democratize access to artificial intelligence. Ben seeks to fulfil the original ambitions of the field. Ben graduated with a PhD in Mathematics from Temple University in 1990. Ben’s approach to AGI over many decades now has been inspired by many disciplines, but in particular from human cognitive psychology and computer science perspective. To date Ben’s work has been mostly theoretically-driven. Ben thinks that most of the deep learning approaches to AGI today try to model the brain. They may have a loose analogy to human neuroscience but they have not tried to derive the details of an AGI architecture from an overall conception of what a mind is. Ben thinks that what matters for creating human-level (or greater) intelligence is having the right information processing architecture, not the underlying mechanics via which the architecture is implemented. Ben thinks that there is a certain set of key cognitive processes and interactions that AGI systems must implement explicitly such as; working and long-term memory, deliberative and reactive processing, perc biological systems tend to be messy, complex and integrative; searching for a single “algorithm of general intelligence” is an inappropriate attempt to project the aesthetics of physics or theoretical computer science into a qualitatively different domain. TOC is on the YT show description https://www.youtube.com/watch?v=sw8IE3MX1SY Panel: Dr. Tim Scarfe, Dr. Yannic Kilcher, Dr. Keith Duggar Artificial General Intelligence: Concept, State of the Art, and Future Prospects https://sciendo.com/abstract/journals... The General Theory of General Intelligence: A Pragmatic Patternist Perspective https://arxiv.org/abs/2103.15100 | |||
18 Jul 2024 | Sara Hooker - Why US AI Act Compute Thresholds Are Misguided | 01:05:41 | |
Sara Hooker is VP of Research at Cohere and leader of Cohere for AI. We discuss her recent paper critiquing the use of compute thresholds, measured in FLOPs (floating point operations), as an AI governance strategy. We explore why this approach, recently adopted in both US and EU AI policies, may be problematic and oversimplified. Sara explains the limitations of using raw computational power as a measure of AI capability or risk, and discusses the complex relationship between compute, data, and model architecture. Equally important, we go into Sara's work on "The AI Language Gap." This research highlights the challenges and inequalities in developing AI systems that work across multiple languages. Sara discusses how current AI models, predominantly trained on English and a handful of high-resource languages, fail to serve the linguistic diversity of our global population. We explore the technical, ethical, and societal implications of this gap, and discuss potential solutions for creating more inclusive and representative AI systems. We broadly discuss the relationship between language, culture, and AI capabilities, as well as the ethical considerations in AI development and deployment. YT Version: https://youtu.be/dBZp47999Ko TOC: [00:00:00] Intro [00:02:12] FLOPS paper [00:26:42] Hardware lottery [00:30:22] The Language gap [00:33:25] Safety [00:38:31] Emergent [00:41:23] Creativity [00:43:40] Long tail [00:44:26] LLMs and society [00:45:36] Model bias [00:48:51] Language and capabilities [00:52:27] Ethical frameworks and RLHF Sara Hooker https://www.sarahooker.me/ https://www.linkedin.com/in/sararosehooker/ https://scholar.google.com/citations?user=2xy6h3sAAAAJ&hl=en https://x.com/sarahookr Interviewer: Tim Scarfe Refs The AI Language gap https://cohere.com/research/papers/the-AI-language-gap.pdf On the Limitations of Compute Thresholds as a Governance Strategy. https://arxiv.org/pdf/2407.05694v1 The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm https://arxiv.org/pdf/2406.18682 Cohere Aya https://cohere.com/research/aya RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs https://arxiv.org/pdf/2407.02552 Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs https://arxiv.org/pdf/2402.14740 Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/ EU AI Act https://www.europarl.europa.eu/doceo/document/TA-9-2024-0138_EN.pdf The bitter lesson http://www.incompleteideas.net/IncIdeas/BitterLesson.html Neel Nanda interview https://www.youtube.com/watch?v=_Ygf0GnlwmY Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet https://transformer-circuits.pub/2024/scaling-monosemanticity/ Chollet's ARC challenge https://github.com/fchollet/ARC-AGI Ryan Greenblatt on ARC https://www.youtube.com/watch?v=z9j3wB1RRGA Disclaimer: This is the third video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview. | |||
28 Feb 2021 | #045 Microsoft's Platform for Reinforcement Learning (Bonsai) | 02:30:17 | |
Microsoft has an interesting strategy with their new “autonomous systems” technology also known as Project Bonsai. They want to create an interface to abstract away the complexity and esoterica of deep reinforcement learning. They want to fuse together expert knowledge and artificial intelligence all on one platform, so that complex problems can be decomposed into simpler ones. They want to take machine learning Ph.Ds out of the equation and make autonomous systems engineering look more like a traditional software engineering process. It is an ambitious undertaking, but interesting. Reinforcement learning is extremely difficult (as I cover in the video), and if you don’t have a team of RL Ph.Ds with tech industry experience, you shouldn’t even consider doing it yourself. This is our take on it! There are 3 chapters in this video; Chapter 1: Tim's intro and take on RL being hard, intro to Bonsai and machine teaching Chapter 2: Interview with Scott Stanfield [recorded Jan 2020] 00:56:41 Chapter 3: Traditional street talk episode [recorded Dec 2020] 01:38:13 This is *not* an official communication from Microsoft, all personal opinions. There is no MS-confidential information in this video. With: Scott Stanfield https://twitter.com/seesharp Megan Bloemsma https://twitter.com/BloemsmaMegan Gurdeep Pall (he has not validated anything we have said in this video or been involved in the creation of it) https://www.linkedin.com/in/gurdeep-pall-0aa639bb/ Panel: Dr. Keith Duggar Dr. Tim Scarfe Yannic Kilcher | |||
04 Jun 2023 | Prof. Daniel Dennett - Could AI Counterfeit People Destroy Civilization? (SPECIAL EDITION) | 01:14:41 | |
Please check out Numerai - our sponsor using our link @ http://numer.ai/mlst Numerai is a groundbreaking platform which is taking the data science world by storm. Tim has been using Numerai to build state-of-the-art models which predict the stock market, all while being a part of an inspiring community of data scientists from around the globe. They host the Numerai Data Science Tournament, where data scientists like us use their financial dataset to predict future stock market performance. Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5 Twitter: https://twitter.com/MLStreetTalk YT version: https://youtu.be/axJtywd9Tbo In this fascinating interview, Dr. Tim Scarfe speaks with renowned philosopher Daniel Dennett about the potential dangers of AI and the concept of "Counterfeit People." Dennett raises concerns about AI being used to create artificial colleagues, and argues that preventing counterfeit AI individuals is crucial for societal trust and security. They delve into Dennett's "Two Black Boxes" thought experiment, the Chinese Room Argument by John Searle, and discuss the implications of AI in terms of reversibility, reontologisation, and realism. Dr. Scarfe and Dennett also examine adversarial LLMs, mental trajectories, and the emergence of consciousness and semanticity in AI systems. Throughout the conversation, they touch upon various philosophical perspectives, including Gilbert Ryle's Ghost in the Machine, Chomsky's work, and the importance of competition in academia. Dennett concludes by highlighting the need for legal and technological barriers to protect against the dangers of counterfeit AI creations. Join Dr. Tim Scarfe and Daniel Dennett in this thought-provoking discussion about the future of AI and the potential challenges we face in preserving our civilization. Don't miss this insightful conversation! TOC: 00:00:00 Intro 00:09:56 Main show kick off 00:12:04 Counterfeit People 00:16:03 Reversibility 00:20:55 Reontologisation 00:24:43 Realism 00:27:48 Adversarial LLMs are out to get us 00:32:34 Exploring mental trajectories and Chomsky 00:38:53 Gilbert Ryle and Ghost in machine and competition in academia 00:44:32 2 Black boxes thought experiment / intentional stance 01:00:11 Chinese room 01:04:49 Singularitarianism 01:07:22 Emergence of consciousness and semanticity References: Tree of Thoughts: Deliberate Problem Solving with Large Language Models https://arxiv.org/abs/2305.10601 The Problem With Counterfeit People (Daniel Dennett) https://www.theatlantic.com/technology/archive/2023/05/problem-counterfeit-people/674075/ The knowledge argument https://en.wikipedia.org/wiki/Knowledge_argument The Intentional Stance https://www.researchgate.net/publication/271180035_The_Intentional_Stance Two Black Boxes: a Fable (Daniel Dennett) https://www.researchgate.net/publication/28762339_Two_Black_Boxes_a_Fable The Chinese Room Argument (John Searle) https://plato.stanford.edu/entries/chinese-room/ https://web-archive.southampton.ac.uk/cogprints.org/7150/1/10.1.1.83.5248.pdf From Bacteria to Bach and Back: The Evolution of Minds (Daniel Dennett) https://www.amazon.co.uk/Bacteria-Bach-Back-Evolution-Minds/dp/014197804X Consciousness Explained (Daniel Dennett) https://www.amazon.co.uk/Consciousness-Explained-Penguin-Science-Dennett/dp/0140128670/ The Mind's I: Fantasies and Reflections on Self and Soul (Hofstadter, Douglas R; Dennett, Daniel C.) https://www.abebooks.co.uk/servlet/BookDetailsPL?bi=31494476184 #DanielDennett #ArtificialIntelligence #CounterfeitPeople | |||
08 Jul 2021 | #56 - Dr. Walid Saba, Gadi Singer, Prof. J. Mark Bishop (Panel discussion) | 01:11:17 | |
It has been over three decades since the statistical revolution overtook AI by a storm and over two decades since deep learning (DL) helped usher the latest resurgence of artificial intelligence (AI). However, the disappointing progress in conversational agents, NLU, and self-driving cars, has made it clear that progress has not lived up to the promise of these empirical and data-driven methods. DARPA has suggested that it is time for a third wave in AI, one that would be characterized by hybrid models – models that combine knowledge-based approaches with data-driven machine learning techniques. Joining us on this panel discussion is polymath and linguist Walid Saba - Co-founder ONTOLOGIK.AI, Gadi Singer - VP & Director, Cognitive Computing Research, Intel Labs and J. Mark Bishop - Professor of Cognitive Computing (Emeritus), Goldsmiths, University of London and Scientific Adviser to FACT360. Moderated by Dr. Keith Duggar and Dr. Tim Scarfe https://www.linkedin.com/in/gadi-singer/ https://www.linkedin.com/in/walidsaba/ https://www.linkedin.com/in/profjmarkbishop/ #machinelearning #artificialintelligence | |||
25 Aug 2024 | "AI should NOT be regulated at all!" - Prof. Pedro Domingos | 02:12:15 | |
Professor Pedro Domingos, is an AI researcher and professor of computer science. He expresses skepticism about current AI regulation efforts and argues for faster AI development rather than slowing it down. He also discusses the need for new innovations to fulfil the promises of current AI techniques. MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmented generation. Try it now - get 2,000 free queries monthly at http://brave.com/api. Show notes: * Domingos' views on AI regulation and why he believes it's misguided * His thoughts on the current state of AI technology and its limitations * Discussion of his novel "2040", a satirical take on AI and tech culture * Explanation of his work on "tensor logic", which aims to unify neural networks and symbolic AI * Critiques of other approaches in AI, including those of OpenAI and Gary Marcus * Thoughts on the AI "bubble" and potential future developments in the field Prof. Pedro Domingos: https://x.com/pmddomingos 2040: A Silicon Valley Satire [Pedro's new book] https://amzn.to/3T51ISd TOC: 00:00:00 Intro 00:06:31 Bio 00:08:40 Filmmaking skit 00:10:35 AI and the wisdom of crowds 00:19:49 Social Media 00:27:48 Master algorithm 00:30:48 Neurosymbolic AI / abstraction 00:39:01 Language 00:45:38 Chomsky 01:00:49 2040 Book 01:18:03 Satire as a shield for criticism? 01:29:12 AI Regulation 01:35:15 Gary Marcus 01:52:37 Copyright 01:56:11 Stochastic parrots come home to roost 02:00:03 Privacy 02:01:55 LLM ecosystem 02:05:06 Tensor logic Refs: The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World [Pedro Domingos] https://amzn.to/3MiWs9B Rebooting AI: Building Artificial Intelligence We Can Trust [Gary Marcus] https://amzn.to/3AAywvL Flash Boys [Michael Lewis] https://amzn.to/4dUGm1M | |||
07 Sep 2020 | UK Algoshambles, Neuralink, GPT-3 and Intelligence | 01:34:33 | |
This week Dr. Tim Scarfe, Dr. Keith Duggar and Yannic "Lightspeed" Kilcher respond to the "Algoshambles" exam fiasco in the UK where the government were forced to step in to standardise the grades which were grossly inflated by the schools. The schools and teachers are all paid on metrics related to the grades received by students, what could possibly go wrong?! The result is that we end up with grades which have lost all their value and students are coached for the exams and don't actually learn the subject. We also cover the second Francois Chollet interview on the Lex Fridman podcast. We cover GPT-3, Neuralink, and discussion of intelligence. 00:00:00 Algoshambles 00:45:40 Lex Fridman/Chollet: Intro 00:55:21 Lex Fridman/Chollet: Neuralink 01:06:28 Lex Fridman/Chollet: GPT-3 01:23:43 Lex Fridman/Chollet: Intelligence discussion | |||
19 May 2020 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | 01:40:02 | |
In this episode of Machine Learning Street Talk, Tim Scarfe, Yannic Kilcher and Connor Shorten chat about Large-scale Transfer Learning in Natural Language Processing. The Text-to-Text Transfer Transformer (T5) model from Google AI does an exhaustive survey of what’s important for Transfer Learning in NLP and what’s not. In this conversation, we go through the key takeaways of the paper, text-to-text input/output format, architecture choice, dataset size and composition, fine-tuning strategy, and how to best use more computation. Beginning with these topics, we diverge into exciting ideas such as embodied cognition, meta-learning, and the measure of intelligence. We are still beginning our podcast journey and really appreciate any feedback from our listeners. Is the chat too technical? Do you prefer group discussions, interviewing experts, or chats between the three of us? Thanks for watching and if you haven’t already, Please Subscribe! Paper Links discussed in the chat: Text-to-Text Transfer Transformer: https://arxiv.org/abs/1910.10683 Experience Grounds Language (relevant to divergent discussion about embodied cognition): https://arxiv.org/pdf/2004.10151.pdf On the Measure of Intelligence: https://arxiv.org/abs/1911.01547 Train Large, Then Compress: https://arxiv.org/pdf/2002.11794.pdf Scaling Laws for Neural Language Models: https://arxiv.org/pdf/2001.08361.pdf The Illustrated Transformer: http://jalammar.github.io/illustrated... ELECTRA: https://arxiv.org/pdf/2003.10555.pdf Transformer-XL: https://arxiv.org/pdf/1901.02860.pdf Reformer: The Efficient Transformer: https://openreview.net/pdf?id=rkgNKkHtvB The Evolved Transformer: https://arxiv.org/pdf/1901.11117.pdf DistilBERT: https://arxiv.org/pdf/1910.01108.pdf How to generate text (HIGHLY RECOMMEND): https://huggingface.co/blog/how-to-ge... Tokenizers: https://blog.floydhub.com/tokenization-nlp/ | |||
04 Jan 2022 | 061: Interpolation, Extrapolation and Linearisation (Prof. Yann LeCun, Dr. Randall Balestriero) | 03:19:43 | |
We are now sponsored by Weights and Biases! Please visit our sponsor link: http://wandb.me/MLST Patreon: https://www.patreon.com/mlst Yann LeCun thinks that it's specious to say neural network models are interpolating because in high dimensions, everything is extrapolation. Recently Dr. Randall Balestriero, Dr. Jerome Pesente and prof. Yann LeCun released their paper learning in high dimensions always amounts to extrapolation. This discussion has completely changed how we think about neural networks and their behaviour. [00:00:00] Pre-intro [00:11:58] Intro Part 1: On linearisation in NNs [00:28:17] Intro Part 2: On interpolation in NNs [00:47:45] Intro Part 3: On the curse [00:48:19] LeCun [01:40:51] Randall B YouTube version: https://youtu.be/86ib0sfdFtw | |||
02 Apr 2023 | #112 AVOIDING AGI APOCALYPSE - CONNOR LEAHY | 02:40:13 | |
Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5 In this podcast with the legendary Connor Leahy (CEO Conjecture) recorded in Dec 2022, we discuss various topics related to artificial intelligence (AI), including AI alignment, the success of ChatGPT, the potential threats of artificial general intelligence (AGI), and the challenges of balancing research and product development at his company, Conjecture. He emphasizes the importance of empathy, dehumanizing our thinking to avoid anthropomorphic biases, and the value of real-world experiences in learning and personal growth. The conversation also covers the Orthogonality Thesis, AI preferences, the mystery of mode collapse, and the paradox of AI alignment. Connor Leahy expresses concern about the rapid development of AI and the potential dangers it poses, especially as AI systems become more powerful and integrated into society. He argues that we need a better understanding of AI systems to ensure their safe and beneficial development. The discussion also touches on the concept of "futuristic whack-a-mole," where futurists predict potential AGI threats, and others try to come up with solutions for those specific scenarios. However, the problem lies in the fact that there could be many more scenarios that neither party can think of, especially when dealing with a system that's smarter than humans. https://www.linkedin.com/in/connor-j-leahy/https://twitter.com/NPCollapse Interviewer: Dr. Tim Scarfe (Innovation CTO @ XRAI Glass https://xrai.glass/) TOC: The success of ChatGPT and its impact on the AI field [00:00:00] Subjective experience [00:15:12] AI Architectural discussion including RLHF [00:18:04] The paradox of AI alignment and the future of AI in society [00:31:44] The impact of AI on society and politics [00:36:11] Future shock levels and the challenges of predicting the future [00:45:58] Long termism and existential risk [00:48:23] Consequentialism vs. deontology in rationalism [00:53:39] The Rationalist Community and its Challenges [01:07:37] AI Alignment and Conjecture [01:14:15] Orthogonality Thesis and AI Preferences [01:17:01] Challenges in AI Alignment [01:20:28] Mechanistic Interpretability in Neural Networks [01:24:54] Building Cleaner Neural Networks [01:31:36] Cognitive horizons / The problem with rapid AI development [01:34:52] Founding Conjecture and raising funds [01:39:36] Inefficiencies in the market and seizing opportunities [01:45:38] Charisma, authenticity, and leadership in startups [01:52:13] Autistic culture and empathy [01:55:26] Learning from real-world experiences [02:01:57] Technical empathy and transhumanism [02:07:18] Moral status and the limits of empathy [02:15:33] Anthropomorphic Thinking and Consequentialism [02:17:42] Conjecture: Balancing Research and Product Development [02:20:37] Epistemology Team at Conjecture [02:31:07] Interpretability and Deception in AGI [02:36:23] Futuristic whack-a-mole and predicting AGI threats [02:38:27] Refs: 1. OpenAI's ChatGPT: https://chat.openai.com/ 2. The Mystery of Mode Collapse (Article): https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse 3. The Rationalist Guide to the Galaxy https://www.amazon.co.uk/Does-Not-Hate-You-Superintelligence/dp/1474608795 5. Alfred Korzybski: https://en.wikipedia.org/wiki/Alfred_Korzybski 6. Instrumental Convergence: https://en.wikipedia.org/wiki/Instrumental_convergence 7. Orthogonality Thesis: https://en.wikipedia.org/wiki/Orthogonality_thesis 8. Brian Tomasik's Essays on Reducing Suffering: https://reducing-suffering.org/ 9. Epistemological Framing for AI Alignment Research: https://www.lesswrong.com/posts/Y4YHTBziAscS5WPN7/epistemological-framing-for-ai-alignment-research 10. How to Defeat Mind readers: https://www.alignmentforum.org/posts/EhAbh2pQoAXkm9yor/circumventing-interpretability-how-to-defeat-mind-readers 11. Society of mind: https://www.amazon.co.uk/Society-Mind-Marvin-Minsky/dp/0671607405 | |||
19 Dec 2022 | (Music Removed) #90 - Prof. DAVID CHALMERS - Consciousness in LLMs [Special Edition] | 00:53:47 | |
Support us! https://www.patreon.com/mlst (On the main version we released; the music was a tiny bit too loud in places, and some pieces had percussion which was a bit distracting -- here is a version with all music removed so you have the option! ) David Chalmers is a professor of philosophy and neural science at New York University, and an honorary professor of philosophy at the Australian National University. He is the co-director of the Center for Mind, Brain, and Consciousness, as well as the PhilPapers Foundation. His research focuses on the philosophy of mind, especially consciousness, and its connection to fields such as cognitive science, physics, and technology. He also investigates areas such as the philosophy of language, metaphysics, and epistemology. With his impressive breadth of knowledge and experience, David Chalmers is a leader in the philosophical community. The central challenge for consciousness studies is to explain how something immaterial, subjective, and personal can arise out of something material, objective, and impersonal. This is illustrated by the example of a bat, whose sensory experience is much different from ours, making it difficult to imagine what it's like to be one. Thomas Nagel's "inconceivability argument" has its advantages and disadvantages, but ultimately it is impossible to solve the mind-body problem due to the subjective nature of experience. This is further explored by examining the concept of philosophical zombies, which are physically and behaviorally indistinguishable from conscious humans yet lack conscious experience. This has implications for the Hard Problem of Consciousness, which is the attempt to explain how mental states are linked to neurophysiological activity. The Chinese Room Argument is used as a thought experiment to explain why physicality may be insufficient to be the source of the subjective, coherent experience we call consciousness. Despite much debate, the Hard Problem of Consciousness remains unsolved. Chalmers has been working on a functional approach to decide whether large language models are, or could be conscious. Filmed at #neurips22 Discord: https://discord.gg/aNPkGUQtc5 Pod: https://anchor.fm/machinelearningstreettalk/episodes/90---Prof--DAVID-CHALMERS---Slightly-Conscious-LLMs-e1sej50 TOC; [00:00:00] Introduction [00:00:40] LLMs consciousness pitch [00:06:33] Philosophical Zombies [00:09:26] The hard problem of consciousness [00:11:40] Nagal's bat and intelligibility [00:21:04] LLM intro clip from NeurIPS [00:22:55] Connor Leahy on self-awareness in LLMs [00:23:30] Sneak peek from unreleased show - could consciousness be a submodule? [00:33:44] SeppH [00:36:15] Tim interviews David at NeurIPS (functionalism / panpsychism / Searle) [00:45:20] Peter Hase interviews Chalmers (focus on interpretability/safety) Panel: Dr. Tim Scarfe Dr. Keith Duggar Contact David; https://mobile.twitter.com/davidchalmers42 https://consc.net/ References; Could a Large Language Model Be Conscious? [Chalmers NeurIPS22 talk] https://nips.cc/media/neurips-2022/Slides/55867.pdf What Is It Like to Be a Bat? [Nagel] https://warwick.ac.uk/fac/cross_fac/iatl/study/ugmodules/humananimalstudies/lectures/32/nagel_bat.pdf Zombies https://plato.stanford.edu/entries/zombies/ zombies on the web [Chalmers] https://consc.net/zombies-on-the-web/ The hard problem of consciousness [Chalmers] https://psycnet.apa.org/record/2007-00485-017 David Chalmers, "Are Large Language Models Sentient?" [NYU talk, same as at NeurIPS] https://www.youtube.com/watch?v=-BcuCmf00_Y | |||
20 Dec 2020 | #034 Eray Özkural- AGI, Simulations & Safety | 02:39:09 | |
Dr. Eray Ozkural is an AGI researcher from Turkey, he is the founder of Celestial Intellect Cybernetics. Eray is extremely critical of Max Tegmark, Nick Bostrom and MIRI founder Elizier Yodokovsky and their views on AI safety. Eray thinks that these views represent a form of neoludditism and they are capturing valuable research budgets with doomsday fear-mongering and effectively want to prevent AI from being developed by those they don't agree with. Eray is also sceptical of the intelligence explosion hypothesis and the argument from simulation. Panel -- Dr. Keith Duggar, Dr. Tim Scarfe, Yannic Kilcher 00:00:00 Show teaser intro with added nuggets and commentary 00:48:39 Main Show Introduction 00:53:14 Doomsaying to Control 00:56:39 Fear the Basilisk! 01:08:00 Intelligence Explosion Ethics 01:09:45 Fear the Automous Drone! ... or spam 01:11:25 Infinity Point Hypothesis 01:15:26 Meat Level Intelligence 01:21:25 Defining Intelligence ... Yet Again 01:27:34 We'll make brains and then shoot them 01:31:00 The Universe likes deep learning 01:33:16 NNs are glorified hash tables 01:38:44 Radical behaviorists 01:41:29 Omega Architecture, possible AGI? 01:53:33 Simulation hypothesis 02:09:44 No one cometh unto Simulation, but by Jesus Christ 02:16:47 Agendas, Motivations, and Mind Projections 02:23:38 A computable Universe of Bulk Automata 02:30:31 Self-Organized Post-Show Coda 02:31:29 Investigating Intelligent Agency is Science 02:36:56 Goodbye and cheers! https://www.youtube.com/watch?v=pZsHZDA9TJU | |||
28 Feb 2022 | #66 ALEXANDER MATTICK - [Unplugged / Community Edition] | 00:50:31 | |
We have a chat with Alexander Mattick aka ZickZack from Yannic's Discord community. Alex is one of the leading voices in that community and has an impressive technical depth. Don't forget MLST has now started it's own Discord server too, come and join us! We are going to run regular events, our first big event on Wednesday 9th 1700-1900 UK time. Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/HNnAwSduud YT version: https://youtu.be/rGOOLC8cIO4 [00:00:00] Introduction to Alex [00:02:16] Spline theory of NNs [00:05:19] Do NNs abstract? [00:08:27] Tim's exposition of spline theory of NNs [00:11:11] Semantics in NNs [00:13:37] Continuous vs discrete [00:19:00] Open-ended Search [00:22:54] Inductive logic programming [00:25:00] Control to gain knowledge and knowledge to gain control [00:30:22] Being a generalist with a breadth of knowledge and knowledge transfer [00:36:29] Causality [00:43:14] Discrete program synthesis + theorem solvers | |||
28 Oct 2020 | Kaggle, ML Community / Engineering (Sanyam Bhutani) | 01:26:59 | |
Join Dr Tim Scarfe, Sayak Paul, Yannic Kilcher, and Alex Stenlake have a conversation with Mr. Chai Time Data Science; Sanyam Bhutani! 00:00:00 Introduction 00:03:42 Show kick off 00:06:34 How did Sanyam get started into ML 00:07:46 Being a content creator 00:09:01 Can you be self taught without a formal education in ML? 00:22:54 Kaggle 00:33:41 H20 product / job 00:40:58 Intepretability / bias / engineering skills 00:43:22 Get that first job in DS 00:46:29 AWS ML Ops architecture / ml engineering 01:14:19 Patterns 01:18:09 Testability 01:20:54 Adversarial examples Sanyam's blog -- https://sanyambhutani.com/tag/chaitimedatascience/ Chai Time Data Science -- https://www.youtube.com/c/ChaiTimeDataScience | |||
04 Nov 2024 | The Elegant Math Behind Machine Learning - Anil Ananthaswamy | 01:53:11 | |
Anil Ananthaswamy is an award-winning science writer and former staff writer and deputy news editor for the London-based New Scientist magazine. Machine learning systems are making life-altering decisions for us: approving mortgage loans, determining whether a tumor is cancerous, or deciding if someone gets bail. They now influence developments and discoveries in chemistry, biology, and physics—the study of genomes, extrasolar planets, even the intricacies of quantum systems. And all this before large language models such as ChatGPT came on the scene. We are living through a revolution in machine learning-powered AI that shows no signs of slowing down. This technology is based on relatively simple mathematical ideas, some of which go back centuries, including linear algebra and calculus, the stuff of seventeenth- and eighteenth-century mathematics. It took the birth and advancement of computer science and the kindling of 1990s computer chips designed for video games to ignite the explosion of AI that we see today. In this enlightening book, Anil Ananthaswamy explains the fundamental math behind machine learning, while suggesting intriguing links between artificial and natural intelligence. Might the same math underpin them both? As Ananthaswamy resonantly concludes, to make safe and effective use of artificial intelligence, we need to understand its profound capabilities and limitations, the clues to which lie in the math that makes machine learning possible. Why Machines Learn: The Elegant Math Behind Modern AI: https://amzn.to/3UAWX3D https://anilananthaswamy.com/ Sponsor message: DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)? Interested? Apply for an ML research position: benjamin@tufa.ai Shownotes: Chapters: 1. ML Fundamentals and Prerequisites [00:00:00] 1.1 Differences Between Human and Machine Learning [00:00:35] 1.2 Mathematical Prerequisites and Societal Impact of ML [00:02:20] 1.3 Author's Journey and Book Background [00:11:30] 1.4 Mathematical Foundations and Core ML Concepts [00:21:45] 1.5 Bias-Variance Tradeoff and Modern Deep Learning 2. Deep Learning Architecture [00:29:05] 2.1 Double Descent and Overparameterization in Deep Learning [00:32:40] 2.2 Mathematical Foundations and Self-Supervised Learning [00:40:05] 2.3 High-Dimensional Spaces and Model Architecture [00:52:55] 2.4 Historical Development of Backpropagation 3. AI Understanding and Limitations [00:59:13] 3.1 Pattern Matching vs Human Reasoning in ML Models [01:00:20] 3.2 Mathematical Foundations and Pattern Recognition in AI [01:04:08] 3.3 LLM Reliability and Machine Understanding Debate [01:12:50] 3.4 Historical Development of Deep Learning Technologies [01:15:21] 3.5 Alternative AI Approaches and Bio-inspired Methods 4. Ethical and Neurological Perspectives [01:24:32] 4.1 Neural Network Scaling and Mathematical Limitations [01:31:12] 4.2 AI Ethics and Societal Impact [01:38:30] 4.3 Consciousness and Neurological Conditions [01:46:17] 4.4 Body Ownership and Agency in Neuroscience | |||
02 Jul 2023 | MUNK DEBATE ON AI (COMMENTARY) [DAVID FOSTER] | 02:08:14 | |
Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB The discussion between Tim Scarfe and David Foster provided an in-depth critique of the arguments made by panelists at the Munk AI Debate on whether artificial intelligence poses an existential threat to humanity. While the panelists made thought-provoking points, Scarfe and Foster found their arguments largely speculative, lacking crucial details and evidence to support claims of an impending existential threat. Scarfe and Foster strongly disagreed with Max Tegmark’s position that AI has an unparalleled “blast radius” that could lead to human extinction. Tegmark failed to provide a credible mechanism for how this scenario would unfold in reality. His arguments relied more on speculation about advanced future technologies than on present capabilities and trends. As Foster argued, we cannot conclude AI poses a threat based on speculation alone. Evidence is needed to ground discussions of existential risks in science rather than science fiction fantasies or doomsday scenarios. They found Yann LeCun’s statements too broad and high-level, critiquing him for not providing sufficiently strong arguments or specifics to back his position. While LeCun aptly noted AI remains narrow in scope and far from achieving human-level intelligence, his arguments lacked crucial details on current limitations and why we should not fear superintelligence emerging in the near future. As Scarfe argued, without these details the discussion descended into “philosophy” rather than focusing on evidence and data. Scarfe and Foster also took issue with Yoshua Bengio’s unsubstantiated speculation that machines would necessarily develop a desire for self-preservation that threatens humanity. There is no evidence today’s AI systems are developing human-like general intelligence or desires, let alone that these attributes would manifest in ways dangerous to humans. The question is not whether machines will eventually surpass human intelligence, but how and when this might realistically unfold based on present technological capabilities. Bengio’s arguments relied more on speculation about advanced future technologies than on evidence from current systems and research. In contrast, they strongly agreed with Melanie Mitchell’s view that scenarios of malevolent or misguided superintelligence are speculation, not backed by evidence from AI as it exists today. Claims of an impending “existential threat” from AI are overblown, harmful to progress, and inspire undue fear of technology rather than consideration of its benefits. Mitchell sensibly argued discussions of risks from emerging technologies must be grounded in science and data, not speculation, if we are to make balanced policy and development decisions. Overall, while the debate raised thought-provoking questions about advanced technologies that could eventually transform our world, none of the speakers made a credible evidence-based case that today’s AI poses an existential threat. Scarfe and Foster argued the debate failed to discuss concrete details about current capabilities and limitations of technologies like language models, which remain narrow in scope. General human-level AI is still missing many components, including physical embodiment, emotions, and the "common sense" reasoning that underlies human thinking. Claims of existential threats require extraordinary evidence to justify policy or research restrictions, not speculation. By discussing possibilities rather than probabilities grounded in evidence, the debate failed to substantively advance our thinking on risks from AI and its plausible development in the coming decades. David's new podcast: https://podcasts.apple.com/us/podcast/the-ai-canvas/id1692538973 Generative AI book: https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/ | |||
05 Jun 2024 | What’s the Magic Word? A Control Theory of LLM Prompting. | 01:17:07 | |
These two scientists have mapped out the insides or “reachable space” of a language model using control theory, what they discovered was extremely surprising. Please support us on Patreon to get access to the private Discord server, bi-weekly calls, early access and ad-free listening. https://patreon.com/mlst YT version: https://youtu.be/Bpgloy1dDn0 Aman Bhargava from Caltech and Cameron Witkowski from the University of Toronto to discuss their groundbreaking paper, “What’s the Magic Word? A Control Theory of LLM Prompting.” (the main theorem on self-attention controllability was developed in collaboration with Dr. Shi-Zhuo Looi from Caltech). They frame LLM systems as discrete stochastic dynamical systems. This means they look at LLMs in a structured way, similar to how we analyze control systems in engineering. They explore the “reachable set” of outputs for an LLM. Essentially, this is the range of possible outputs the model can generate from a given starting point when influenced by different prompts. The research highlights that prompt engineering, or optimizing the input tokens, can significantly influence LLM outputs. They show that even short prompts can drastically alter the likelihood of specific outputs. Aman and Cameron’s work might be a boon for understanding and improving LLMs. They suggest that a deeper exploration of control theory concepts could lead to more reliable and capable language models. We dropped an additional, more technical video on the research on our Twitter account here: https://x.com/MLStreetTalk/status/1795093759471890606 Additional 20 minutes of unreleased footage on our Patreon here: https://www.patreon.com/posts/whats-magic-word-104922629 What's the Magic Word? A Control Theory of LLM Prompting (Aman Bhargava, Cameron Witkowski, Manav Shah, Matt Thomson) https://arxiv.org/abs/2310.04444 LLM Control Theory Seminar (April 2024) https://www.youtube.com/watch?v=9QtS9sVBFM0 Society for the pursuit of AGI (Cameron founded it) https://agisociety.mydurable.com/ Roger Federer demo http://conway.languagegame.io/inference Neural Cellular Automata, Active Inference, and the Mystery of Biological Computation (Aman) https://aman-bhargava.com/ai/neuro/neuromorphic/2024/03/25/nca-do-active-inference.html Aman and Cameron also want to thank Dr. Shi-Zhuo Looi and Prof. Matt Thomson from from Caltech for help and advice on their research. (https://thomsonlab.caltech.edu/ and https://pma.caltech.edu/people/looi-shi-zhuo) https://x.com/ABhargava2000 https://x.com/witkowski_cam | |||
29 Oct 2023 | THE HARD PROBLEM OF OBSERVERS - WOLFRAM & FRISTON [SPECIAL EDITION] | 01:59:29 | |
Please support us! https://www.patreon.com/mlst https://discord.gg/aNPkGUQtc5 https://twitter.com/MLStreetTalk YT version (with intro not found here) https://youtu.be/6iaT-0Dvhnc This is the epic special edition show you have been waiting for! With two of the most brilliant scientists alive today. Atoms, things, agents, ... observers. What even defines an "observer" and what properties must all observers share? How do objects persist in our universe given that their material composition changes over time? What does it mean for a thing to be a thing? And do things supervene on our lower-level physical reality? What does it mean for a thing to have agency? What's the difference between a complex dynamical system with and without agency? Could a rock or an AI catflap have agency? Can the universe be factorised into distinct agents, or is agency diffused? Have you ever pondered about these deep questions about reality? Prof. Friston and Dr. Wolfram have spent their entire careers, some 40+ years each thinking long and hard about these very questions and have developed significant frameworks of reference on their respective journeys (the Wolfram Physics project and the Free Energy principle). Panel: MIT Ph.D Keith Duggar Production: Dr. Tim Scarfe Refs: TED Talk with Stephen: https://www.ted.com/talks/stephen_wolfram_how_to_think_computationally_about_ai_the_universe_and_everything https://writings.stephenwolfram.com/2023/10/how-to-think-computationally-about-ai-the-universe-and-everything/ TOC 00:00:00 - Show kickoff 00:02:38 - Wolfram gets to grips with FEP 00:27:08 - How much control does an agent/observer have 00:34:52 - Observer persistence, what universe seems like to us 00:40:31 - Black holes 00:45:07 - Inside vs outside 00:52:20 - Moving away from the predictable path 00:55:26 - What can observers do 01:06:50 - Self modelling gives agency 01:11:26 - How do you know a thing has agency? 01:22:48 - Deep link between dynamics, ruliad and AI 01:25:52 - Does agency entail free will? Defining Agency 01:32:57 - Where do I probe for agency? 01:39:13 - Why is the universe the way we see it? 01:42:50 - Alien intelligence 01:43:40 - The hard problem of Observers 01:46:20 - Summary thoughts from Wolfram 01:49:35 - Factorisability of FEP 01:57:05 - Patreon interview teaser | |||
23 Jan 2025 | Subbarao Kambhampati - Do o1 models search? | 01:32:13 | |
Join Prof. Subbarao Kambhampati and host Tim Scarfe for a deep dive into OpenAI's O1 model and the future of AI reasoning systems. * How O1 likely uses reinforcement learning similar to AlphaGo, with hidden reasoning tokens that users pay for but never see * The evolution from traditional Large Language Models to more sophisticated reasoning systems * The concept of "fractal intelligence" in AI - where models work brilliantly sometimes but fail unpredictably * Why O1's improved performance comes with substantial computational costs * The ongoing debate between single-model approaches (OpenAI) vs hybrid systems (Google) * The critical distinction between AI as an intelligence amplifier vs autonomous decision-maker SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** TOC: 1. **O1 Architecture and Reasoning Foundations** [00:00:00] 1.1 Fractal Intelligence and Reasoning Model Limitations [00:04:28] 1.2 LLM Evolution: From Simple Prompting to Advanced Reasoning [00:14:28] 1.3 O1's Architecture and AlphaGo-like Reasoning Approach [00:23:18] 1.4 Empirical Evaluation of O1's Planning Capabilities 2. **Monte Carlo Methods and Model Deep-Dive** [00:29:30] 2.1 Monte Carlo Methods and MARCO-O1 Implementation [00:31:30] 2.2 Reasoning vs. Retrieval in LLM Systems [00:40:40] 2.3 Fractal Intelligence Capabilities and Limitations [00:45:59] 2.4 Mechanistic Interpretability of Model Behavior [00:51:41] 2.5 O1 Response Patterns and Performance Analysis 3. **System Design and Real-World Applications** [00:59:30] 3.1 Evolution from LLMs to Language Reasoning Models [01:06:48] 3.2 Cost-Efficiency Analysis: LLMs vs O1 [01:11:28] 3.3 Autonomous vs Human-in-the-Loop Systems [01:16:01] 3.4 Program Generation and Fine-Tuning Approaches [01:26:08] 3.5 Hybrid Architecture Implementation Strategies Transcript: https://www.dropbox.com/scl/fi/d0ef4ovnfxi0lknirkvft/Subbarao.pdf?rlkey=l3rp29gs4hkut7he8u04mm1df&dl=0 REFS: [00:02:00] Monty Python (1975) Witch trial scene: flawed logical reasoning. https://www.youtube.com/watch?v=zrzMhU_4m-g [00:04:00] Cade Metz (2024) Microsoft–OpenAI partnership evolution and control dynamics. https://www.nytimes.com/2024/10/17/technology/microsoft-openai-partnership-deal.html [00:07:25] Kojima et al. (2022) Zero-shot chain-of-thought prompting ('Let's think step by step'). https://arxiv.org/pdf/2205.11916 [00:12:50] DeepMind Research Team (2023) Multi-bot game solving with external and internal planning. https://deepmind.google/research/publications/139455/ [00:15:10] Silver et al. (2016) AlphaGo's Monte Carlo Tree Search and Q-learning. https://www.nature.com/articles/nature16961 [00:16:30] Kambhampati, S. et al. (2023) Evaluates O1's planning in "Strawberry Fields" benchmarks. https://arxiv.org/pdf/2410.02162 [00:29:30] Alibaba AIDC-AI Team (2023) MARCO-O1: Chain-of-Thought + MCTS for improved reasoning. https://arxiv.org/html/2411.14405 [00:31:30] Kambhampati, S. (2024) Explores LLM "reasoning vs retrieval" debate. https://arxiv.org/html/2403.04121v2 [00:37:35] Wei, J. et al. (2022) Chain-of-thought prompting (introduces last-letter concatenation). https://arxiv.org/pdf/2201.11903 [00:42:35] Barbero, F. et al. (2024) Transformer attention and "information over-squashing." https://arxiv.org/html/2406.04267v2 [00:46:05] Ruis, L. et al. (2023) Influence functions to understand procedural knowledge in LLMs. https://arxiv.org/html/2411.12580v1 (truncated - continued in shownotes/transcript doc) | |||
22 Aug 2024 | Adversarial Examples and Data Modelling - Andrew Ilyas (MIT) | 01:28:00 | |
Andrew Ilyas, a PhD student at MIT who is about to start as a professor at CMU. We discuss Data modeling and understanding how datasets influence model predictions, Adversarial examples in machine learning and why they occur, Robustness in machine learning models, Black box attacks on machine learning systems, Biases in data collection and dataset creation, particularly in ImageNet and Self-selection bias in data and methods to address it. MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api Andrew's site: https://andrewilyas.com/ https://x.com/andrew_ilyas TOC: 00:00:00 - Introduction and Andrew's background 00:03:52 - Overview of the machine learning pipeline 00:06:31 - Data modeling paper discussion 00:26:28 - TRAK: Evolution of data modeling work 00:43:58 - Discussion on abstraction, reasoning, and neural networks 00:53:16 - "Adversarial Examples Are Not Bugs, They Are Features" paper 01:03:24 - Types of features learned by neural networks 01:10:51 - Black box attacks paper 01:15:39 - Work on data collection and bias 01:25:48 - Future research plans and closing thoughts References: Adversarial Examples Are Not Bugs, They Are Features https://arxiv.org/pdf/1905.02175 TRAK: Attributing Model Behavior at Scale https://arxiv.org/pdf/2303.14186 Datamodels: Predicting Predictions from Training Data https://arxiv.org/pdf/2202.00622 Adversarial Examples Are Not Bugs, They Are Features https://arxiv.org/pdf/1905.02175 IMAGENET-TRAINED CNNS https://arxiv.org/pdf/1811.12231 ZOO: Zeroth Order Optimization Based Black-box https://arxiv.org/pdf/1708.03999 A Spline Theory of Deep Networks https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf Scaling Monosemanticity https://transformer-circuits.pub/2024/scaling-monosemanticity/ Adversarial Examples Are Not Bugs, They Are Features https://gradientscience.org/adv/ Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies https://proceedings.mlr.press/v235/bartoldson24a.html Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors https://arxiv.org/abs/1807.07978 Estimation of Standard Auction Models https://arxiv.org/abs/2205.02060 From ImageNet to Image Classification: Contextualizing Progress on Benchmarks https://arxiv.org/abs/2005.11295 Estimation of Standard Auction Models https://arxiv.org/abs/2205.02060 What Makes A Good Fisherman? Linear Regression under Self-Selection Bias https://arxiv.org/abs/2205.03246 Towards Tracing Factual Knowledge in Language Models Back to the Training Data [Akyürek] https://arxiv.org/pdf/2205.11482 | |||
11 Aug 2024 | Jay Alammar on LLMs, RAG, and AI Engineering | 00:57:28 | |
Jay Alammar, renowned AI educator and researcher at Cohere, discusses the latest developments in large language models (LLMs) and their applications in industry. Jay shares his expertise on retrieval augmented generation (RAG), semantic search, and the future of AI architectures. MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api. Cohere Command R model series: https://cohere.com/command Jay Alamaar: https://x.com/jayalammar Buy Jay's new book here! Hands-On Large Language Models: Language Understanding and Generation https://amzn.to/4fzOUgh TOC: 00:00:00 Introduction to Jay Alammar and AI Education 00:01:47 Cohere's Approach to RAG and AI Re-ranking 00:07:15 Implementing AI in Enterprise: Challenges and Solutions 00:09:26 Jay's Role at Cohere and the Importance of Learning in Public 00:15:16 The Evolution of AI in Industry: From Deep Learning to LLMs 00:26:12 Expert Advice for Newcomers in Machine Learning 00:32:39 The Power of Semantic Search and Embeddings in AI Systems 00:37:59 Jay Alammar's Journey as an AI Educator and Visualizer 00:43:36 Visual Learning in AI: Making Complex Concepts Accessible 00:47:38 Strategies for Keeping Up with Rapid AI Advancements 00:49:12 The Future of Transformer Models and AI Architectures 00:51:40 Evolution of the Transformer: From 2017 to Present 00:54:19 Preview of Jay's Upcoming Book on Large Language Models Disclaimer: This is the fourth video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview. Note also that this combines several previously unpublished interviews from Jay into one, the earlier one at Tim's house was shot in Aug 2023, and the more recent one in Toronto in May 2024. Refs: The Illustrated Transformer https://jalammar.github.io/illustrated-transformer/ Attention Is All You Need https://arxiv.org/abs/1706.03762 The Unreasonable Effectiveness of Recurrent Neural Networks http://karpathy.github.io/2015/05/21/rnn-effectiveness/ Neural Networks in 11 Lines of Code https://iamtrask.github.io/2015/07/12/basic-python-network/ Understanding LSTM Networks (Chris Olah's blog post) http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Luis Serrano's YouTube Channel https://www.youtube.com/channel/UCgBncpylJ1kiVaPyP-PZauQ Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks https://arxiv.org/abs/1908.10084 GPT (Generative Pre-trained Transformer) models https://jalammar.github.io/illustrated-gpt2/ https://openai.com/research/gpt-4 BERT (Bidirectional Encoder Representations from Transformers) https://jalammar.github.io/illustrated-bert/ https://arxiv.org/abs/1810.04805 RoPE (Rotary Positional Encoding) https://arxiv.org/abs/2104.09864 (Linked paper discussing rotary embeddings) Grouped Query Attention https://arxiv.org/pdf/2305.13245 RLHF (Reinforcement Learning from Human Feedback) https://openai.com/research/learning-from-human-preferences https://arxiv.org/abs/1706.03741 DPO (Direct Preference Optimization) https://arxiv.org/abs/2305.18290 | |||
08 Dec 2022 | #85 Dr. Petar Veličković (Deepmind) - Categories, Graphs, Reasoning [NEURIPS22 UNPLUGGED] | 00:36:55 | |
Dr. Petar Veličković is a Staff Research Scientist at DeepMind, he has firmly established himself as one of the most significant up and coming researchers in the deep learning space. He invented Graph Attention Networks in 2017 and has been a leading light in the field ever since pioneering research in Graph Neural Networks, Geometric Deep Learning and also Neural Algorithmic reasoning. If you haven’t already, you should check out our video on the Geometric Deep learning blueprint, featuring Petar. I caught up with him last week at NeurIPS. In this show, from NeurIPS 2022 we discussed his recent work on category theory and graph neural networks. https://petar-v.com/ https://twitter.com/PetarV_93/ TOC: Categories (Cats for AI) [00:00:00] Reasoning [00:14:44] Extrapolation [00:19:09] Ishan Misra Skit [00:27:50] Graphs (Expander Graph Propagation) [00:29:18] YT: https://youtu.be/1lkdWduuN14 MLST Discord: https://discord.gg/V25vQeFwhS Support us! https://www.patreon.com/mlst References on YT description, lots of them! Host: Dr. Tim Scarfe | |||
29 Sep 2020 | Capsule Networks and Education Targets | 01:24:08 | |
In today's episode, Dr. Keith Duggar, Alex Stenlake and Dr. Tim Scarfe chat about the education chapter in Kenneth Stanley's "Greatness cannot be planned" book, and we relate it to our Algoshambes conversation a few weeks ago. We debate whether objectives in education are a good thing and whether they cause perverse incentives and stifle creativity and innovation. Next up we dissect capsule networks from the top down! We finish off talking about fast algorithms and quantum computing. 00:00:00 Introduction 00:01:13 Greatness cannot be planned / education 00:12:03 Perverse incentives 00:19:25 Treasure hunting 00:30:28 Capsule Networks 00:46:08 Capsules As Compositional Networks 00:52:45 Capsule Routing 00:57:10 Loss and Warps 01:09:55 Fast Algorithms and Quantum Computing | |||
04 Apr 2021 | #50 Christian Szegedy - Formal Reasoning, Program Synthesis | 01:33:22 | |
Dr. Christian Szegedy from Google Research is a deep learning heavyweight. He invented adversarial examples, one of the first object detection algorithms, the inceptionnet architecture, and co-invented batchnorm. He thinks that if you bet on computers and software in 1990 you would have been as right as if you bet on AI now. But he thinks that we have been programming computers the same way since the 1950s and there has been a huge stagnation ever since. Mathematics is the process of taking a fuzzy thought and formalising it. But could we automate that? Could we create a system which will act like a super human mathematician but you can talk to it in natural language? This is what Christian calls autoformalisation. Christian thinks that automating many of the things we do in mathematics is the first step towards software synthesis and building human-level AGI. Mathematics ability is the litmus test for general reasoning ability. Christian has a fascinating take on transformers too. With Yannic Lightspeed Kilcher and Dr. Mathew Salvaris Whimsical Canvas with Tim's Notes: https://whimsical.com/mar-26th-christian-szegedy-CpgGhnEYDBrDMFoATU6XYC YouTube version (with detailed table of contents) https://youtu.be/ehNGGYFO6ms | |||
25 Mar 2022 | #71 - ZAK JOST (Graph Neural Networks + Geometric DL) [UNPLUGGED] | 01:02:35 | |
Special discount link for Zak's GNN course - https://bit.ly/3uqmYVq Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB YT version: https://youtu.be/jAGIuobLp60 (there are lots of helper graphics there, recommended if poss) Want to sponsor MLST!? Let us know on Linkedin / Twitter. [00:00:00] Preamble [00:03:12] Geometric deep learning [00:10:04] Message passing [00:20:42] Top down vs bottom up [00:24:59] All NN architectures are different forms of information diffusion processes (squashing and smoothing problem) [00:29:51] Graph rewiring [00:31:38] Back to information diffusion [00:42:43] Transformers vs GNNs [00:47:10] Equivariant subgraph aggregation networks + WL test [00:55:36] Do equivariant layers aggregate too? [00:57:49] Zak's GNN course Exhaustive list of references on the YT show URL (https://youtu.be/jAGIuobLp60) | |||
28 Nov 2020 | #031 WE GOT ACCESS TO GPT-3! (With Gary Marcus, Walid Saba and Connor Leahy) | 02:44:06 | |
In this special edition, Dr. Tim Scarfe, Yannic Kilcher and Keith Duggar speak with Gary Marcus and Connor Leahy about GPT-3. We have all had a significant amount of time to experiment with GPT-3 and show you demos of it in use and the considerations. Note that this podcast version is significantly truncated, watch the youtube version for the TOC and experiments with GPT-3 https://www.youtube.com/watch?v=iccd86vOz3w | |||
06 Dec 2020 | #032- Simon Kornblith / GoogleAI - SimCLR and Paper Haul! | 01:30:29 | |
This week Dr. Tim Scarfe, Sayak Paul and Yannic Kilcher speak with Dr. Simon Kornblith from Google Brain (Ph.D from MIT). Simon is trying to understand how neural nets do what they do. Simon was the second author on the seminal Google AI SimCLR paper. We also cover "Do Wide and Deep Networks learn the same things?", "Whats in a Loss function for Image Classification?", and "Big Self-supervised models are strong semi-supervised learners". Simon used to be a neuroscientist and also gives us the story of his unique journey into ML. 00:00:00 Show Teaser / or "short version" 00:18:34 Show intro 00:22:11 Relationship between neuroscience and machine learning 00:29:28 Similarity analysis and evolution of representations in Neural Networks 00:39:55 Expressability of NNs 00:42:33 Whats in a loss function for image classification 00:46:52 Loss function implications for transfer learning 00:50:44 SimCLR paper 01:00:19 Contrast SimCLR to BYOL 01:01:43 Data augmentation 01:06:35 Universality of image representations 01:09:25 Universality of augmentations 01:23:04 GPT-3 01:25:09 GANs for data augmentation?? 01:26:50 Julia language @skornblith https://www.linkedin.com/in/simon-kornblith-54b2033a/ https://arxiv.org/abs/2010.15327 Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth https://arxiv.org/abs/2010.16402 What's in a Loss Function for Image Classification? https://arxiv.org/abs/2002.05709 A Simple Framework for Contrastive Learning of Visual Representations https://arxiv.org/abs/2006.10029 Big Self-Supervised Models are Strong Semi-Supervised Learners | |||
10 Feb 2023 | #100 Dr. PATRICK LEWIS (co:here) - Retrieval Augmented Generation | 00:26:28 | |
Dr. Patrick Lewis is a London-based AI and Natural Language Processing Research Scientist, working at co:here. Prior to this, Patrick worked as a research scientist at the Fundamental AI Research Lab (FAIR) at Meta AI. During his PhD, Patrick split his time between FAIR and University College London, working with Sebastian Riedel and Pontus Stenetorp. Patrick’s research focuses on the intersection of information retrieval techniques (IR) and large language models (LLMs). He has done extensive work on Retrieval-Augmented Language Models. His current focus is on building more powerful, efficient, robust, and update-able models that can perform well on a wide range of NLP tasks, but also excel on knowledge-intensive NLP tasks such as Question Answering and Fact Checking. YT version: https://youtu.be/Dm5sfALoL1Y MLST Discord: https://discord.gg/aNPkGUQtc5 Support us! https://www.patreon.com/mlst References: Patrick Lewis (Natural Language Processing Research Scientist @ co:here) https://www.patricklewis.io/ Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Patrick Lewis et al) https://arxiv.org/abs/2005.11401 Atlas: Few-shot Learning with Retrieval Augmented Language Models (Gautier Izacard, Patrick Lewis, et al) https://arxiv.org/abs/2208.03299 Improving language models by retrieving from trillions of tokens (RETRO) (Sebastian Borgeaud et al) https://arxiv.org/abs/2112.04426 | |||
11 Feb 2023 | #103 - Prof. Edward Grefenstette - Language, Semantics, Philosophy | 01:01:46 | |
Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5 YT: https://youtu.be/i9VPPmQn9HQ Edward Grefenstette is a Franco-American computer scientist who currently serves as Head of Machine Learning at Cohere and Honorary Professor at UCL. He has previously been a research scientist at Facebook AI Research and staff research scientist at DeepMind, and was also the CTO of Dark Blue Labs. Prior to his move to industry, Edward was a Fulford Junior Research Fellow at Somerville College, University of Oxford, and was lecturing at Hertford College. He obtained his BSc in Physics and Philosophy from the University of Sheffield and did graduate work in the philosophy departments at the University of St Andrews. His research draws on topics and methods from Machine Learning, Computational Linguistics and Quantum Information Theory, and has done work implementing and evaluating compositional vector-based models of natural language semantics and empirical semantic knowledge discovery. https://www.egrefen.com/ https://cohere.ai/ TOC: [00:00:00] Introduction [00:02:52] Differential Semantics [00:06:56] Concepts [00:10:20] Ontology [00:14:02] Pragmatics [00:16:55] Code helps with language [00:19:02] Montague [00:22:13] RLHF [00:31:54] Swiss cheese problem / retrieval augmented [00:37:06] Intelligence / Agency [00:43:33] Creativity [00:46:41] Common sense [00:53:46] Thinking vs knowing References: Large language models are not zero-shot communicators (Laura Ruis) https://arxiv.org/abs/2210.14986 Some remarks on Large Language Models (Yoav Goldberg) https://gist.github.com/yoavg/59d174608e92e845c8994ac2e234c8a9 Quantum Natural Language Processing (Bob Coecke) https://www.cs.ox.ac.uk/people/bob.coecke/QNLP-ACT.pdf Constitutional AI: Harmlessness from AI Feedback https://www.anthropic.com/constitutional.pdf Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Patrick Lewis) https://www.patricklewis.io/publication/rag/ Natural General Intelligence (Prof. Christopher Summerfield) https://global.oup.com/academic/product/natural-general-intelligence-9780192843883 ChatGPT with Rob Miles - Computerphile https://www.youtube.com/watch?v=viJt_DXTfwA | |||
25 Nov 2024 | How AI Could Be A Mathematician's Co-Pilot by 2026 (Prof. Swarat Chaudhuri) | 01:44:42 | |
Professor Swarat Chaudhuri from the University of Texas at Austin and visiting researcher at Google DeepMind discusses breakthroughs in AI reasoning, theorem proving, and mathematical discovery. Chaudhuri explains his groundbreaking work on COPRA (a GPT-based prover agent), shares insights on neurosymbolic approaches to AI. Professor Swarat Chaudhuri: https://www.cs.utexas.edu/~swarat/ SPONSOR MESSAGES: CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on ARC and AGI, they just acquired MindsAI - the current winners of the ARC challenge. Are you interested in working on ARC, or getting involved in their events? Goto https://tufalabs.ai/ TOC: [00:00:00] 0. Introduction / CentML ad, Tufa ad 1. AI Reasoning: From Language Models to Neurosymbolic Approaches [00:02:27] 1.1 Defining Reasoning in AI [00:09:51] 1.2 Limitations of Current Language Models [00:17:22] 1.3 Neuro-symbolic Approaches and Program Synthesis [00:24:59] 1.4 COPRA and In-Context Learning for Theorem Proving [00:34:39] 1.5 Symbolic Regression and LLM-Guided Abstraction 2. AI in Mathematics: Theorem Proving and Concept Discovery [00:43:37] 2.1 AI-Assisted Theorem Proving and Proof Verification [01:01:37] 2.2 Symbolic Regression and Concept Discovery in Mathematics [01:11:57] 2.3 Scaling and Modularizing Mathematical Proofs [01:21:53] 2.4 COPRA: In-Context Learning for Formal Theorem-Proving [01:28:22] 2.5 AI-driven theorem proving and mathematical discovery 3. Formal Methods and Challenges in AI Mathematics [01:30:42] 3.1 Formal proofs, empirical predicates, and uncertainty in AI mathematics [01:34:01] 3.2 Characteristics of good theoretical computer science research [01:39:16] 3.3 LLMs in theorem generation and proving [01:42:21] 3.4 Addressing contamination and concept learning in AI systems REFS: 00:04:58 The Chinese Room Argument, https://plato.stanford.edu/entries/chinese-room/ 00:11:42 Software 2.0, https://medium.com/@karpathy/software-2-0-a64152b37c35 00:11:57 Solving Olympiad Geometry Without Human Demonstrations, https://www.nature.com/articles/s41586-023-06747-5 00:13:26 Lean, https://lean-lang.org/ 00:15:43 A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go Through Self-Play, https://www.science.org/doi/10.1126/science.aar6404 00:19:24 DreamCoder (Ellis et al., PLDI 2021), https://arxiv.org/abs/2006.08381 00:24:37 The Lambda Calculus, https://plato.stanford.edu/entries/lambda-calculus/ 00:26:43 Neural Sketch Learning for Conditional Program Generation, https://arxiv.org/pdf/1703.05698 00:28:08 Learning Differentiable Programs With Admissible Neural Heuristics, https://arxiv.org/abs/2007.12101 00:31:03 Symbolic Regression With a Learned Concept Library (Grayeli et al., NeurIPS 2024), https://arxiv.org/abs/2409.09359 00:41:30 Formal Verification of Parallel Programs, https://dl.acm.org/doi/10.1145/360248.360251 01:00:37 Training Compute-Optimal Large Language Models, https://arxiv.org/abs/2203.15556 01:18:19 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, https://arxiv.org/abs/2201.11903 01:18:42 Draft, Sketch, and Prove: Guiding Formal Theorem Provers With Informal Proofs, https://arxiv.org/abs/2210.12283 01:19:49 Learning Formal Mathematics From Intrinsic Motivation, https://arxiv.org/pdf/2407.00695 01:20:19 An In-Context Learning Agent for Formal Theorem-Proving (Thakur et al., CoLM 2024), https://arxiv.org/pdf/2310.04353 01:23:58 Learning to Prove Theorems via Interacting With Proof Assistants, https://arxiv.org/abs/1905.09381 01:39:58 An In-Context Learning Agent for Formal Theorem-Proving (Thakur et al., CoLM 2024), https://arxiv.org/pdf/2310.04353 01:42:24 Programmatically Interpretable Reinforcement Learning (Verma et al., ICML 2018), https://arxiv.org/abs/1804.02477 | |||
19 Oct 2024 | Decompiling Dreams: A New Approach to ARC? - Alessandro Palmarini | 00:51:34 | |
Alessandro Palmarini is a post-baccalaureate researcher at the Santa Fe Institute working under the supervision of Melanie Mitchell. He completed his undergraduate degree in Artificial Intelligence and Computer Science at the University of Edinburgh. Palmarini's current research focuses on developing AI systems that can efficiently acquire new skills from limited data, inspired by François Chollet's work on measuring intelligence. His work builds upon the DreamCoder program synthesis system, introducing a novel approach called "dream decompiling" to improve library learning in inductive program synthesis. Palmarini is particularly interested in addressing the Abstraction and Reasoning Corpus (ARC) challenge, aiming to create AI systems that can perform abstract reasoning tasks more efficiently than current approaches. His research explores the balance between computational efficiency and data efficiency in AI learning processes. DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)? MLST is sponsored by Tufa Labs: Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more. Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2. Interested? Apply for an ML research position: benjamin@tufa.ai TOC: 1. Intelligence Measurement in AI Systems [00:00:00] 1.1 Defining Intelligence in AI Systems [00:02:00] 1.2 Research at Santa Fe Institute [00:04:35] 1.3 Impact of Gaming on AI Development [00:05:10] 1.4 Comparing AI and Human Learning Efficiency 2. Efficient Skill Acquisition in AI [00:06:40] 2.1 Intelligence as Skill Acquisition Efficiency [00:08:25] 2.2 Limitations of Current AI Systems in Generalization [00:09:45] 2.3 Human vs. AI Cognitive Processes [00:10:40] 2.4 Measuring AI Intelligence: Chollet's ARC Challenge 3. Program Synthesis and ARC Challenge [00:12:55] 3.1 Philosophical Foundations of Program Synthesis [00:17:14] 3.2 Introduction to Program Induction and ARC Tasks [00:18:49] 3.3 DreamCoder: Principles and Techniques [00:27:55] 3.4 Trade-offs in Program Synthesis Search Strategies [00:31:52] 3.5 Neural Networks and Bayesian Program Learning 4. Advanced Program Synthesis Techniques [00:32:30] 4.1 DreamCoder and Dream Decompiling Approach [00:39:00] 4.2 Beta Distribution and Caching in Program Synthesis [00:45:10] 4.3 Performance and Limitations of Dream Decompiling [00:47:45] 4.4 Alessandro's Approach to ARC Challenge [00:51:12] 4.5 Conclusion and Future Discussions Refs: Full reflist on YT VD, Show Notes and MP3 metadata Show Notes: https://www.dropbox.com/scl/fi/x50201tgqucj5ba2q4typ/Ale.pdf?rlkey=0ubvk7p5gtyx1gpownpdadim8&st=5pniu3nq&dl=0 | |||
24 Mar 2025 | ARC Prize v2 Launch! (Francois Chollet and Mike Knoop) | 00:54:15 | |
We are joined by Francois Chollet and Mike Knoop, to launch the new version of the ARC prize! In version 2, the challenges have been calibrated with humans such that at least 2 humans could solve each task in a reasonable task, but also adversarially selected so that frontier reasoning models can't solve them. The best LLMs today get negligible performance on this challenge. https://arcprize.org/ SPONSOR MESSAGES: *** Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT: https://www.dropbox.com/scl/fi/0v9o8xcpppdwnkntj59oi/ARCv2.pdf?rlkey=luqb6f141976vra6zdtptv5uj&dl=0 TOC: 1. ARC v2 Core Design & Objectives [00:00:00] 1.1 ARC v2 Launch and Benchmark Architecture [00:03:16] 1.2 Test-Time Optimization and AGI Assessment [00:06:24] 1.3 Human-AI Capability Analysis [00:13:02] 1.4 OpenAI o3 Initial Performance Results 2. ARC Technical Evolution [00:17:20] 2.1 ARC-v1 to ARC-v2 Design Improvements [00:21:12] 2.2 Human Validation Methodology [00:26:05] 2.3 Task Design and Gaming Prevention [00:29:11] 2.4 Intelligence Measurement Framework 3. O3 Performance & Future Challenges [00:38:50] 3.1 O3 Comprehensive Performance Analysis [00:43:40] 3.2 System Limitations and Failure Modes [00:49:30] 3.3 Program Synthesis Applications [00:53:00] 3.4 Future Development Roadmap REFS: [00:00:15] On the Measure of Intelligence, François Chollet https://arxiv.org/abs/1911.01547 [00:06:45] ARC Prize Foundation, François Chollet, Mike Knoop https://arcprize.org/ [00:12:50] OpenAI o3 model performance on ARC v1, ARC Prize Team https://arcprize.org/blog/oai-o3-pub-breakthrough [00:18:30] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei et al. https://arxiv.org/abs/2201.11903 [00:21:45] ARC-v2 benchmark tasks, Mike Knoop https://arcprize.org/blog/introducing-arc-agi-public-leaderboard [00:26:05] ARC Prize 2024: Technical Report, Francois Chollet et al. https://arxiv.org/html/2412.04604v2 [00:32:45] ARC Prize 2024 Technical Report, Francois Chollet, Mike Knoop, Gregory Kamradt https://arxiv.org/abs/2412.04604 [00:48:55] The Bitter Lesson, Rich Sutton http://www.incompleteideas.net/IncIdeas/BitterLesson.html [00:53:30] Decoding strategies in neural text generation, Sina Zarrieß https://www.mdpi.com/2078-2489/12/9/355/pdf | |||
23 Apr 2025 | Prof. Randall Balestriero - LLMs without pretraining and SSL | 00:34:30 | |
Randall Balestriero joins the show to discuss some counterintuitive findings in AI. He shares research showing that huge language models, even when started from scratch (randomly initialized) without massive pre-training, can learn specific tasks like sentiment analysis surprisingly well, train stably, and avoid severe overfitting, sometimes matching the performance of costly pre-trained models. This raises questions about when giant pre-training efforts are truly worth it. He also talks about how self-supervised learning (where models learn from data structure itself) and traditional supervised learning (using labeled data) are fundamentally similar, allowing researchers to apply decades of supervised learning theory to improve newer self-supervised methods. Finally, Randall touches on fairness in AI models used for Earth data (like climate prediction), revealing that these models can be biased, performing poorly in specific locations like islands or coastlines even if they seem accurate overall, which has important implications for policy decisions based on this data. SPONSOR MESSAGES: *** Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT + SHOWNOTES: https://www.dropbox.com/scl/fi/n7yev71nsjso71jyjz1fy/RANDALLNEURIPS.pdf?rlkey=0dn4injp1sc4ts8njwf3wfmxv&dl=0 TOC: 1. Model Training Efficiency and Scale [00:00:00] 1.1 Training Stability of Large Models on Small Datasets [00:04:09] 1.2 Pre-training vs Random Initialization Performance Comparison [00:07:58] 1.3 Task-Specific Models vs General LLMs Efficiency 2. Learning Paradigms and Data Distribution [00:10:35] 2.1 Fair Language Model Paradox and Token Frequency Issues [00:12:02] 2.2 Pre-training vs Single-task Learning Spectrum [00:16:04] 2.3 Theoretical Equivalence of Supervised and Self-supervised Learning [00:19:40] 2.4 Self-Supervised Learning and Supervised Learning Relationships [00:21:25] 2.5 SSL Objectives and Heavy-tailed Data Distribution Challenges 3. Geographic Representation in ML Systems [00:25:20] 3.1 Geographic Bias in Earth Data Models and Neural Representations [00:28:10] 3.2 Mathematical Limitations and Model Improvements [00:30:24] 3.3 Data Quality and Geographic Bias in ML Datasets REFS: [00:01:40] Research on training large language models from scratch on small datasets, Randall Balestriero et al. https://openreview.net/forum?id=wYGBWOjq1Q [00:10:35] The Fair Language Model Paradox (2024), Andrea Pinto, Tomer Galanti, Randall Balestriero https://arxiv.org/abs/2410.11985 [00:12:20] Muppet: Massive Multi-task Representations with Pre-Finetuning (2021), Armen Aghajanyan et al. https://arxiv.org/abs/2101.11038 [00:14:30] Dissociating language and thought in large language models (2023), Kyle Mahowald et al. https://arxiv.org/abs/2301.06627 [00:16:05] The Birth of Self-Supervised Learning: A Supervised Theory, Randall Balestriero et al. https://openreview.net/forum?id=NhYAjAAdQT [00:21:25] VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning, Adrien Bardes, Jean Ponce, Yann LeCun https://arxiv.org/abs/2105.04906 [00:25:20] No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data (2025), Daniel Cai, Randall Balestriero, et al. https://arxiv.org/abs/2502.06831 [00:33:45] Mark Ibrahim et al.'s work on geographic bias in computer vision datasets, Mark Ibrahim https://arxiv.org/pdf/2304.12210 | |||
03 Jan 2021 | #036 - Max Welling: Quantum, Manifolds & Symmetries in ML | 01:42:31 | |
Today we had a fantastic conversation with Professor Max Welling, VP of Technology, Qualcomm Technologies Netherlands B.V. Max is a strong believer in the power of data and computation and its relevance to artificial intelligence. There is a fundamental blank slate paradgm in machine learning, experience and data alone currently rule the roost. Max wants to build a house of domain knowledge on top of that blank slate. Max thinks there are no predictions without assumptions, no generalization without inductive bias. The bias-variance tradeoff tells us that we need to use additional human knowledge when data is insufficient. Max Welling has pioneered many of the most sophistocated inductive priors in DL models developed in recent years, allowing us to use Deep Learning with non-euclidean data i.e. on graphs/topology (a field we now called "geometric deep learning") or allowing network architectures to recognise new symmetries in the data for example gauge or SE(3) equivariance. Max has also brought many other concepts from his physics playbook into ML, for example quantum and even Bayesian approaches. This is not an episode to miss, it might be our best yet! Panel: Dr. Tim Scarfe, Yannic Kilcher, Alex Stenlake 00:00:00 Show introduction 00:04:37 Protein Fold from DeepMind -- did it use SE(3) transformer? 00:09:58 How has machine learning progressed 00:19:57 Quantum Deformed Neural Networks paper 00:22:54 Probabilistic Numeric Convolutional Neural Networks paper 00:27:04 Ilia Karmanov from Qualcomm interview mini segment 00:32:04 Main Show Intro 00:35:21 How is Max known in the community? 00:36:35 How Max nurtures talent, freedom and relationship is key 00:40:30 Selecting research directions and guidance 00:43:42 Priors vs experience (bias/variance trade-off) 00:48:47 Generative models and GPT-3 00:51:57 Bias/variance trade off -- when do priors hurt us 00:54:48 Capsule networks 01:03:09 Which old ideas whould we revive 01:04:36 Hardware lottery paper 01:07:50 Greatness can't be planned (Kenneth Stanley reference) 01:09:10 A new sort of peer review and originality 01:11:57 Quantum Computing 01:14:25 Quantum deformed neural networks paper 01:21:57 Probabalistic numeric convolutional neural networks 01:26:35 Matrix exponential 01:28:44 Other ideas from physics i.e. chaos, holography, renormalisation 01:34:25 Reddit 01:37:19 Open review system in ML 01:41:43 Outro | |||
11 Nov 2024 | Eliezer Yudkowsky and Stephen Wolfram on AI X-risk | 04:18:30 | |
Eliezer Yudkowsky and Stephen Wolfram discuss artificial intelligence and its potential existen‑ tial risks. They traversed fundamental questions about AI safety, consciousness, computational irreducibility, and the nature of intelligence. The discourse centered on Yudkowsky’s argument that advanced AI systems pose an existential threat to humanity, primarily due to the challenge of alignment and the potential for emergent goals that diverge from human values. Wolfram, while acknowledging potential risks, approached the topic from a his signature measured perspective, emphasizing the importance of understanding computational systems’ fundamental nature and questioning whether AI systems would necessarily develop the kind of goal‑directed behavior Yudkowsky fears. *** MLST IS SPONSORED BY TUFA AI LABS! The current winners of the ARC challenge, MindsAI are part of Tufa AI Labs. They are hiring ML engineers. Are you interested?! Please goto https://tufalabs.ai/ *** TOC: 1. Foundational AI Concepts and Risks [00:00:01] 1.1 AI Optimization and System Capabilities Debate [00:06:46] 1.2 Computational Irreducibility and Intelligence Limitations [00:20:09] 1.3 Existential Risk and Species Succession [00:23:28] 1.4 Consciousness and Value Preservation in AI Systems 2. Ethics and Philosophy in AI [00:33:24] 2.1 Moral Value of Human Consciousness vs. Computation [00:36:30] 2.2 Ethics and Moral Philosophy Debate [00:39:58] 2.3 Existential Risks and Digital Immortality [00:43:30] 2.4 Consciousness and Personal Identity in Brain Emulation 3. Truth and Logic in AI Systems [00:54:39] 3.1 AI Persuasion Ethics and Truth [01:01:48] 3.2 Mathematical Truth and Logic in AI Systems [01:11:29] 3.3 Universal Truth vs Personal Interpretation in Ethics and Mathematics [01:14:43] 3.4 Quantum Mechanics and Fundamental Reality Debate 4. AI Capabilities and Constraints [01:21:21] 4.1 AI Perception and Physical Laws [01:28:33] 4.2 AI Capabilities and Computational Constraints [01:34:59] 4.3 AI Motivation and Anthropomorphization Debate [01:38:09] 4.4 Prediction vs Agency in AI Systems 5. AI System Architecture and Behavior [01:44:47] 5.1 Computational Irreducibility and Probabilistic Prediction [01:48:10] 5.2 Teleological vs Mechanistic Explanations of AI Behavior [02:09:41] 5.3 Machine Learning as Assembly of Computational Components [02:29:52] 5.4 AI Safety and Predictability in Complex Systems 6. Goal Optimization and Alignment [02:50:30] 6.1 Goal Specification and Optimization Challenges in AI Systems [02:58:31] 6.2 Intelligence, Computation, and Goal-Directed Behavior [03:02:18] 6.3 Optimization Goals and Human Existential Risk [03:08:49] 6.4 Emergent Goals and AI Alignment Challenges 7. AI Evolution and Risk Assessment [03:19:44] 7.1 Inner Optimization and Mesa-Optimization Theory [03:34:00] 7.2 Dynamic AI Goals and Extinction Risk Debate [03:56:05] 7.3 AI Risk and Biological System Analogies [04:09:37] 7.4 Expert Risk Assessments and Optimism vs Reality 8. Future Implications and Economics [04:13:01] 8.1 Economic and Proliferation Considerations SHOWNOTES (transcription, references, summary, best quotes etc): https://www.dropbox.com/scl/fi/3st8dts2ba7yob161dchd/EliezerWolfram.pdf?rlkey=b6va5j8upgqwl9s2muc924vtt&st=vemwqx7a&dl=0 | |||
11 Jan 2021 | #037 - Tour De Bayesian with Connor Tann | 01:35:25 | |
Connor Tan is a physicist and senior data scientist working for a multinational energy company where he co-founded and leads a data science team. He holds a first-class degree in experimental and theoretical physics from Cambridge university. With a master's in particle astrophysics. He specializes in the application of machine learning models and Bayesian methods. Today we explore the history, pratical utility, and unique capabilities of Bayesian methods. We also discuss the computational difficulties inherent in Bayesian methods along with modern methods for approximate solutions such as Markov Chain Monte Carlo. Finally, we discuss how Bayesian optimization in the context of automl may one day put Data Scientists like Connor out of work. Panel: Dr. Keith Duggar, Alex Stenlake, Dr. Tim Scarfe 00:00:00 Duggars philisophical ramblings on Bayesianism 00:05:10 Introduction 00:07:30 small datasets and prior scientific knowledge 00:10:37 Bayesian methods are probability theory 00:14:00 Bayesian methods demand hard computations 00:15:46 uncertainty can matter more than estimators 00:19:29 updating or combining knowledge is a key feature 00:25:39 Frequency or Reasonable Expectation as the Primary Concept 00:30:02 Gambling and coin flips 00:37:32 Rev. Thomas Bayes's pool table 00:40:37 ignorance priors are beautiful yet hard 00:43:49 connections between common distributions 00:49:13 A curious Universe, Benford's Law 00:55:17 choosing priors, a tale of two factories 01:02:19 integration, the computational Achilles heel 01:35:25 Bayesian social context in the ML community 01:10:24 frequentist methods as a first approximation 01:13:13 driven to Bayesian methods by small sample size 01:18:46 Bayesian optimization with automl, a job killer? 01:25:28 different approaches to hyper-parameter optimization 01:30:18 advice for aspiring Bayesians 01:33:59 who would connor interview next? Connor Tann: https://www.linkedin.com/in/connor-tann-a92906a1/ https://twitter.com/connossor | |||
19 May 2021 | #53 Quantum Natural Language Processing - Prof. Bob Coecke (Oxford) | 02:17:39 | |
Bob Coercke is a celebrated physicist, he's been a Physics and Quantum professor at Oxford University for the last 20 years. He is particularly interested in Structure which is to say, Logic, Order, and Category Theory. He is well known for work involving compositional distributional models of natural language meaning and he is also fascinated with understanding how our brains work. Bob was recently appointed as the Chief Scientist at Cambridge Quantum Computing. Bob thinks that interactions between systems in Quantum Mechanics carries naturally over to how word meanings interact in natural language. Bob argues that this interaction embodies the phenomenon of quantum teleportation. Bob invented ZX-calculus, a graphical calculus for revealing the compositional structure inside quantum circuits - to show entanglement states and protocols in a visually succinct but logically complete way. Von Neumann himself didn't even like his own original symbolic formalism of quantum theory, despite it being widely used! We hope you enjoy this fascinating conversation which might give you a lot of insight into natural language processing. Tim Intro [00:00:00] The topological brain (Post-record button skit) [00:13:22] Show kick off [00:19:31] Bob introduction [00:22:37] Changing culture in universities [00:24:51] Machine Learning is like electricity [00:31:50] NLP -- what is Bob's Quantum conception? [00:34:50] The missing text problem [00:52:59] Can statistical induction be trusted? [00:59:49] On pragmatism and hybrid systems [01:04:42] Parlour tricks, parsing and information flows [01:07:43] How much human input is required with Bob's method? [01:11:29] Reality, meaning, structure and language [01:14:42] Replacing complexity with quantum entanglement, emergent complexity [01:17:45] Loading quantum data requires machine learning [01:19:49] QC is happy math coincidence for NLP [01:22:30] The Theory of English (ToE) [01:28:23] ... or can we learn the ToE? [01:29:56] How did diagrammatic quantum calculus come about? [01:31:04 The state of quantum computing today [01:37:49] NLP on QC might be doable even in the NISQ era [01:40:48] Hype and private investment are driving progress [01:48:34] Crypto discussion (moved to post-show) [01:50:38] Kilcher is in a startup (moved to post show) [01:53:40 Debrief [01:55:26] | |||
22 May 2020 | ICLR 2020: Yoshua Bengio and the Nature of Consciousness | 02:34:17 | |
In this episode of Machine Learning Street Talk, Tim Scarfe, Connor Shorten and Yannic Kilcher react to Yoshua Bengio’s ICLR 2020 Keynote “Deep Learning Priors Associated with Conscious Processing”. Bengio takes on many future directions for research in Deep Learning such as the role of attention in consciousness, sparse factor graphs and causality, and the study of systematic generalization. Bengio also presents big ideas in Intelligence that border on the line of philosophy and practical machine learning. This includes ideas such as consciousness in machines and System 1 and System 2 thinking, as described in Daniel Kahneman’s book “Thinking Fast and Slow”. Similar to Yann LeCun’s half of the 2020 ICLR keynote, this talk takes on many challenging ideas and hopefully this video helps you get a better understanding of some of them! Thanks for watching! Please Subscribe for more videos! Paper Links: Link to Talk: https://iclr.cc/virtual_2020/speaker_7.html The Consciousness Prior: https://arxiv.org/abs/1709.08568 Thinking Fast and Slow: https://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555 Systematic Generalization: https://arxiv.org/abs/1811.12889 CLOSURE: Assessing Systematic Generalization of CLEVR Models: https://arxiv.org/abs/1912.05783 Neural Module Networks: https://arxiv.org/abs/1511.02799 Experience Grounds Language: https://arxiv.org/pdf/2004.10151.pdf Benchmarking Graph Neural Networks: https://arxiv.org/pdf/2003.00982.pdf On the Measure of Intelligence: https://arxiv.org/abs/1911.01547 Please check out our individual channels as well! Machine Learning Dojo with Tim Scarfe: https://www.youtube.com/channel/UCXvHuBMbgJw67i5vrMBBobA Yannic Kilcher: https://www.youtube.com/channel/UCZHmQk67mSJgfCCTn7xBfe Henry AI Labs: https://www.youtube.com/channel/UCHB9VepY6kYvZjj0Bgxnpbw 00:00:00 Tim and Yannics takes 00:01:37 Intro to Bengio 00:03:13 System 2, language and Chomsky 00:05:58 Cristof Koch on conciousness 00:07:25 Francois Chollet on intelligence and consciousness 00:09:29 Meditation and Sam Harris on consciousness 00:11:35 Connor Intro 00:13:20 Show Main Intro 00:17:55 Priors associated with Conscious Processing 00:26:25 System 1 / System 2 00:42:47 Implicit and Verbalized Knowledge [DONT MISS THIS!] 01:08:24 Inductive Priors for DL 2.0 01:27:20 Systematic Generalization 01:37:53 Contrast with the Symbolic AI Program 01:54:55 Attention 02:00:25 From Attention to Consciousness 02:05:31 Thoughts, Consciousness, Language 02:06:55 Sparse Factor graph 02:10:52 Sparse Change in Abstract Latent Space 02:15:10 Discovering Cause and Effect 02:20:00 Factorize the joint distribution 02:22:30 RIMS: Modular Computation 02:24:30 Conclusion #machinelearning #deeplearning | |||
20 Mar 2024 | Can we build a generalist agent? Dr. Minqi Jiang and Dr. Marc Rigter | 01:57:11 | |
Dr. Minqi Jiang and Dr. Marc Rigter explain an innovative new method to make the intelligence of agents more general-purpose by training them to learn many worlds before their usual goal-directed training, which we call "reinforcement learning". Their new paper is called "Reward-free curricula for training robust world models" https://arxiv.org/pdf/2306.09205.pdf https://twitter.com/MinqiJiang https://twitter.com/MarcRigter Interviewer: Dr. Tim Scarfe Please support us on Patreon, Tim is now doing MLST full-time and taking a massive financial hit. If you love MLST and want this to continue, please show your support! In return you get access to shows very early and private discord and networking. https://patreon.com/mlst We are also looking for show sponsors, please get in touch if interested mlstreettalk at gmail. MLST Discord: https://discord.gg/machine-learning-street-talk-mlst-937356144060530778 | |||
06 Nov 2024 | Pattern Recognition vs True Intelligence - Francois Chollet | 02:42:54 | |
Francois Chollet, a prominent AI expert and creator of ARC-AGI, discusses intelligence, consciousness, and artificial intelligence. Chollet explains that real intelligence isn't about memorizing information or having lots of knowledge - it's about being able to handle new situations effectively. This is why he believes current large language models (LLMs) have "near-zero intelligence" despite their impressive abilities. They're more like sophisticated memory and pattern-matching systems than truly intelligent beings. *** MLST IS SPONSORED BY TUFA AI LABS! The current winners of the ARC challenge, MindsAI are part of Tufa AI Labs. They are hiring ML engineers. Are you interested?! Please goto https://tufalabs.ai/ *** He introduced his "Kaleidoscope Hypothesis," which suggests that while the world seems infinitely complex, it's actually made up of simpler patterns that repeat and combine in different ways. True intelligence, he argues, involves identifying these basic patterns and using them to understand new situations. Chollet also talked about consciousness, suggesting it develops gradually in children rather than appearing all at once. He believes consciousness exists in degrees - animals have it to some extent, and even human consciousness varies with age and circumstances (like being more conscious when learning something new versus doing routine tasks). On AI safety, Chollet takes a notably different stance from many in Silicon Valley. He views AGI development as a scientific challenge rather than a religious quest, and doesn't share the apocalyptic concerns of some AI researchers. He argues that intelligence itself isn't dangerous - it's just a tool for turning information into useful models. What matters is how we choose to use it. ARC-AGI Prize: https://arcprize.org/ Francois Chollet: https://x.com/fchollet Shownotes: https://www.dropbox.com/scl/fi/j2068j3hlj8br96pfa7bi/CHOLLET_FINAL.pdf?rlkey=xkbr7tbnrjdl66m246w26uc8k&st=0a4ec4na&dl=0 TOC: 1. Intelligence and Model Building [00:00:00] 1.1 Intelligence Definition and ARC Benchmark [00:05:40] 1.2 LLMs as Program Memorization Systems [00:09:36] 1.3 Kaleidoscope Hypothesis and Abstract Building Blocks [00:13:39] 1.4 Deep Learning Limitations and System 2 Reasoning [00:29:38] 1.5 Intelligence vs. Skill in LLMs and Model Building 2. ARC Benchmark and Program Synthesis [00:37:36] 2.1 Intelligence Definition and LLM Limitations [00:41:33] 2.2 Meta-Learning System Architecture [00:56:21] 2.3 Program Search and Occam's Razor [00:59:42] 2.4 Developer-Aware Generalization [01:06:49] 2.5 Task Generation and Benchmark Design 3. Cognitive Systems and Program Generation [01:14:38] 3.1 System 1/2 Thinking Fundamentals [01:22:17] 3.2 Program Synthesis and Combinatorial Challenges [01:31:18] 3.3 Test-Time Fine-Tuning Strategies [01:36:10] 3.4 Evaluation and Leakage Problems [01:43:22] 3.5 ARC Implementation Approaches 4. Intelligence and Language Systems [01:50:06] 4.1 Intelligence as Tool vs Agent [01:53:53] 4.2 Cultural Knowledge Integration [01:58:42] 4.3 Language and Abstraction Generation [02:02:41] 4.4 Embodiment in Cognitive Systems [02:09:02] 4.5 Language as Cognitive Operating System 5. Consciousness and AI Safety [02:14:05] 5.1 Consciousness and Intelligence Relationship [02:20:25] 5.2 Development of Machine Consciousness [02:28:40] 5.3 Consciousness Prerequisites and Indicators [02:36:36] 5.4 AGI Safety Considerations [02:40:29] 5.5 AI Regulation Framework | |||
03 Oct 2020 | The Social Dilemma - Part 1 | 01:07:19 | |
In this first part of our three part series on the Social Dilemma Netflix film, Dr. Tim Scarfe, Yannic "Lightspeed" Kilcher and Zak Jost gang up with Cybersecurity expert Andy Smith. We give you our take on the film. We are super excited to get your feedback on this one! Hope you enjoy.
00:00:00 Introduction 00:06:11 Moral hypocrisy 00:12:38 Road to hell is paved with good intentions, attention economy 00:15:04 They know everything about you 00:18:02 Addiction 00:21:22 Differential realities 00:26:12 Self determination and Monetisation 00:29:08 AI: Overwhelm human strengths undermine human vulnerabilities 00:31:51 Conspiracy theory / fake news 00:34:23 Overton window / polarisation 00:39:12 Short attention span / convergent behaviour 00:41:26 Is social media good for you 00:45:17 Your attention time is linear, the things you can pay attention to are a volume, anonymity 00:51:32 Andy question on security: social engineering 00:56:32 Is it a security risk having your information in social media 00:58:02 Retrospective judgement 01:03:06 Free speech and censorship 01:06:06 Technology accelerator | |||
18 Sep 2020 | Kernels! | 01:37:29 | |
Today Yannic Lightspeed Kilcher and I spoke with Alex Stenlake about Kernel Methods. What is a kernel? Do you remember those weird kernel things which everyone obsessed about before deep learning? What about Representer theorem and reproducible kernel hilbert spaces? SVMs and kernel ridge regression? Remember them?! Hope you enjoy the conversation! 00:00:00 Tim Intro 00:01:35 Yannic clever insight from this discussion 00:03:25 Street talk and Alex intro 00:05:06 How kernels are taught 00:09:20 Computational tractability 00:10:32 Maths 00:11:50 What is a kernel? 00:19:39 Kernel latent expansion 00:23:57 Overfitting 00:24:50 Hilbert spaces 00:30:20 Compare to DL 00:31:18 Back to hilbert spaces 00:45:19 Computational tractability 2 00:52:23 Curse of dimensionality 00:55:01 RBF: infinite taylor series 00:57:20 Margin/SVM 01:00:07 KRR/dual 01:03:26 Complexity compute kernels vs deep learning 01:05:03 Good for small problems? vs deep learning) 01:07:50 Whats special about the RBF kernel 01:11:06 Another DL comparison 01:14:01 Representer theorem 01:20:05 Relation to back prop 01:25:10 Connection with NLP/transformers 01:27:31 Where else kernels good 01:34:34 Deep learning vs dual kernel methods 01:33:29 Thoughts on AI 01:34:35 Outro | |||
04 Nov 2020 | NLP is not NLU and GPT-3 - Walid Saba | 02:20:32 | |
#machinelearning This week Dr. Tim Scarfe, Dr. Keith Duggar and Yannic Kilcher speak with veteran NLU expert Dr. Walid Saba. Walid is an old-school AI expert. He is a polymath, a neuroscientist, psychologist, linguist, philosopher, statistician, and logician. He thinks the missing information problem and lack of a typed ontology is the key issue with NLU, not sample efficiency or generalisation. He is a big critic of the deep learning movement and BERTology. We also cover GPT-3 in some detail in today's session, covering Luciano Floridi's recent article "GPT‑3: Its Nature, Scope, Limits, and Consequences" and a commentary on the incredible power of GPT-3 to perform tasks with just a few examples including the Yann LeCun commentary on Facebook and Hackernews. Time stamps on the YouTube version 0:00:00 Walid intro 00:05:03 Knowledge acquisition bottleneck 00:06:11 Language is ambiguous 00:07:41 Language is not learned 00:08:32 Language is a formal language 00:08:55 Learning from data doesn’t work 00:14:01 Intelligence 00:15:07 Lack of domain knowledge these days 00:16:37 Yannic Kilcher thuglife comment 00:17:57 Deep learning assault 00:20:07 The way we evaluate language models is flawed 00:20:47 Humans do type checking 00:23:02 Ontologic 00:25:48 Comments On GPT3 00:30:54 Yann lecun and reddit 00:33:57 Minds and machines - Luciano 00:35:55 Main show introduction 00:39:02 Walid introduces himself 00:40:20 science advances one funeral at a time 00:44:58 Deep learning obsession syndrome and inception 00:46:14 BERTology / empirical methods are not NLU 00:49:55 Pattern recognition vs domain reasoning, is the knowledge in the data 00:56:04 Natural language understanding is about decoding and not compression, it's not learnable. 01:01:46 Intelligence is about not needing infinite amounts of time 01:04:23 We need an explicit ontological structure to understand anything 01:06:40 Ontological concepts 01:09:38 Word embeddings 01:12:20 There is power in structure 01:15:16 Language models are not trained on pronoun disambiguation and resolving scopes 01:17:33 The information is not in the data 01:19:03 Can we generate these rules on the fly? Rules or data? 01:20:39 The missing data problem is key 01:21:19 Problem with empirical methods and lecunn reference 01:22:45 Comparison with meatspace (brains) 01:28:16 The knowledge graph game, is knowledge constructed or discovered 01:29:41 How small can this ontology of the world be? 01:33:08 Walids taxonomy of understanding 01:38:49 The trend seems to be, less rules is better not the othe way around? 01:40:30 Testing the latest NLP models with entailment 01:42:25 Problems with the way we evaluate NLP 01:44:10 Winograd Schema challenge 01:45:56 All you need to know now is how to build neural networks, lack of rigour in ML research 01:50:47 Is everything learnable 01:53:02 How should we elevate language systems? 01:54:04 10 big problems in language (missing information) 01:55:59 Multiple inheritance is wrong 01:58:19 Language is ambiguous 02:01:14 How big would our world ontology need to be? 02:05:49 How to learn more about NLU 02:09:10 AlphaGo Walid's blog: https://medium.com/@ontologik LinkedIn: https://www.linkedin.com/in/walidsaba/ | |||
09 Jan 2024 | $450M AI Startup In 3 Years | Chai AI | 00:29:47 | |
Chai AI is the leading platform for conversational chat artificial intelligence. Note: this is a sponsored episode of MLST. William Beauchamp is the founder of two $100M+ companies - Chai Research, an AI startup, and Seamless Capital, a hedge fund based in Cambridge, UK. Chaiverse is the Chai AI developer platform, where developers can train, submit and evaluate on millions of real users to win their share of $1,000,000. https://www.chai-research.com https://www.chaiverse.com https://twitter.com/chai_research https://facebook.com/chairesearch/ https://www.instagram.com/chairesearch/ Download the app on iOS and Android (https://onelink.to/kqzhy9 ) #chai #chai_ai #chai_research #chaiverse #generative_ai #LLMs | |||
29 Jun 2024 | Aiden Gomez - CEO of Cohere (AI's 'Inner Monologue' – Crucial for Reasoning) | 01:00:22 | |
Aidan Gomez, CEO of Cohere, reveals how they're tackling AI hallucinations and improving reasoning abilities. He also explains why Cohere doesn't use any output from GPT-4 for training their models. Aidan shares his personal insights into the world of AI and LLMs and Cohere's unique approach to solving real-world business problems, and how their models are set apart from the competition. Aidan reveals how they are making major strides in AI technology, discussing everything from last mile customer engineering to the robustness of prompts and future architectures. He also touches on the broader implications of AI for society, including potential risks and the role of regulation. He discusses Cohere's guiding principles and the health the of startup scene. With a particular focus on enterprise applications. Aidan provides a rare look into the internal workings of Cohere and their vision for driving productivity and innovation. https://cohere.com/ https://x.com/aidangomez Check out Cohere's amazing new Command R* models here https://cohere.com/command Disclaimer: This is the second video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview. | |||
30 Dec 2022 | #96 Prof. PEDRO DOMINGOS - There are no infinities, utility functions, neurosymbolic | 02:49:14 | |
Pedro Domingos, Professor Emeritus of Computer Science and Engineering at the University of Washington, is renowned for his research in machine learning, particularly for his work on Markov logic networks that allow for uncertain inference. He is also the author of the acclaimed book "The Master Algorithm". Panel: Dr. Tim Scarfe TOC: [00:00:00] Introduction [00:01:34] Galaxtica / misinformation / gatekeeping [00:12:31] Is there a master algorithm? [00:16:29] Limits of our understanding [00:21:57] Intentionality, Agency, Creativity [00:27:56] Compositionality [00:29:30] Digital Physics / It from bit / Wolfram [00:35:17] Alignment / Utility functions [00:43:36] Meritocracy [00:45:53] Game theory [01:00:00] EA/consequentialism/Utility [01:11:09] Emergence / relationalism [01:19:26] Markov logic [01:25:38] Moving away from anthropocentrism [01:28:57] Neurosymbolic / infinity / tensor algerbra [01:53:45] Abstraction [01:57:26] Symmetries / Geometric DL [02:02:46] Bias variance trade off [02:05:49] What seen at neurips [02:12:58] Chalmers talk on LLMs [02:28:32] Definition of intelligence [02:32:40] LLMs [02:35:14] On experts in different fields [02:40:15] Back to intelligence [02:41:37] Spline theory / extrapolation YT version: https://www.youtube.com/watch?v=C9BH3F2c0vQ References; The Master Algorithm [Domingos] https://www.amazon.co.uk/s?k=master+algorithm&i=stripbooks&crid=3CJ67DCY96DE8&sprefix=master+algorith%2Cstripbooks%2C82&ref=nb_sb_noss_2 INFORMATION, PHYSICS, QUANTUM: THE SEARCH FOR LINKS [John Wheeler/It from Bit] https://philpapers.org/archive/WHEIPQ.pdf A New Kind Of Science [Wolfram] https://www.amazon.co.uk/New-Kind-Science-Stephen-Wolfram/dp/1579550088 The Rationalist's Guide to the Galaxy: Superintelligent AI and the Geeks Who Are Trying to Save Humanity's Future [Tom Chivers] https://www.amazon.co.uk/Does-Not-Hate-You-Superintelligence/dp/1474608795 The Status Game: On Social Position and How We Use It [Will Storr] https://www.goodreads.com/book/show/60598238-the-status-game Newcomb's paradox https://en.wikipedia.org/wiki/Newcomb%27s_paradox The Case for Strong Emergence [Sabine Hossenfelder] https://philpapers.org/rec/HOSTCF-3 Markov Logic: An Interface Layer for Artificial Intelligence [Domingos] https://www.morganclaypool.com/doi/abs/10.2200/S00206ED1V01Y200907AIM007 Note; Pedro discussed “Tensor Logic” - I was not able to find a reference Neural Networks and the Chomsky Hierarchy [Grégoire Delétang/DeepMind] https://arxiv.org/abs/2207.02098 Connectionism and Cognitive Architecture: A Critical Analysis [Jerry A. Fodor and Zenon W. Pylyshyn] https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/proseminars/Proseminar13/ConnectionistArchitecture.pdf Every Model Learned by Gradient Descent Is Approximately a Kernel Machine [Pedro Domingos] https://arxiv.org/abs/2012.00152 A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27 [LeCun] https://openreview.net/pdf?id=BZ5a1r-kVsf Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges [Michael M. Bronstein, Joan Bruna, Taco Cohen, Petar Veličković] https://arxiv.org/abs/2104.13478 The Algebraic Mind: Integrating Connectionism and Cognitive Science [Gary Marcus] https://www.amazon.co.uk/Algebraic-Mind-Integrating-Connectionism-D | |||
14 Jul 2024 | Prof. Murray Shanahan - Machines Don't Think Like Us | 02:15:22 | |
Murray Shanahan is a professor of Cognitive Robotics at Imperial College London and a senior research scientist at DeepMind. He challenges our assumptions about AI consciousness and urges us to rethink how we talk about machine intelligence. We explore the dangers of anthropomorphizing AI, the limitations of current language in describing AI capabilities, and the fascinating intersection of philosophy and artificial intelligence. Show notes and full references: https://docs.google.com/document/d/1ICtBI574W-xGi8Z2ZtUNeKWiOiGZ_DRsp9EnyYAISws/edit?usp=sharing Prof Murray Shanahan: https://www.doc.ic.ac.uk/~mpsha/ (look at his selected publications) https://scholar.google.co.uk/citations?user=00bnGpAAAAAJ&hl=en https://en.wikipedia.org/wiki/Murray_Shanahan https://x.com/mpshanahan Interviewer: Dr. Tim Scarfe Refs (links in the Google doc linked above): Role play with large language models Waluigi effect "Conscious Exotica" - Paper by Murray Shanahan (2016) "Simulators" - Article by Janis from LessWrong "Embodiment and the Inner Life" - Book by Murray Shanahan (2010) "The Technological Singularity" - Book by Murray Shanahan (2015) "Simulacra as Conscious Exotica" - Paper by Murray Shanahan (newer paper of the original focussed on LLMs) A recent paper by Anthropic on using autoencoders to find features in language models (referring to the "Scaling Monosemanticity" paper) Work by Peter Godfrey-Smith on octopus consciousness "Metaphors We Live By" - Book by George Lakoff (1980s) Work by Aaron Sloman on the concept of "space of possible minds" (1984 article mentioned) Wittgenstein's "Philosophical Investigations" (posthumously published) Daniel Dennett's work on the "intentional stance" Alan Turing's original paper on the Turing Test (1950) Thomas Nagel's paper "What is it like to be a bat?" (1974) John Searle's Chinese Room Argument (mentioned but not detailed) Work by Richard Evans on tackling reasoning problems Claude Shannon's quote on knowledge and control "Are We Bodies or Souls?" - Book by Richard Swinburne Reference to work by Ethan Perez and others at Anthropic on potential deceptive behavior in language models Reference to a paper by Murray Shanahan and Antonia Creswell on the "selection inference framework" Mention of work by Francois Chollet, particularly the ARC (Abstraction and Reasoning Corpus) challenge Reference to Elizabeth Spelke's work on core knowledge in infants Mention of Karl Friston's work on planning as inference (active inference) The film "Ex Machina" - Murray Shanahan was the scientific advisor "The Waluigi Effect" Anthropic's constitutional AI approach Loom system by Lara Reynolds and Kyle McDonald for visualizing conversation trees DeepMind's AlphaGo (mentioned multiple times as an example) Mention of the "Golden Gate Claude" experiment Reference to an interview Tim Scarfe conducted with University of Toronto students about self-attention controllability theorem Mention of an interview with Irina Rish Reference to an interview Tim Scarfe conducted with Daniel Dennett Reference to an interview with Maria Santa Caterina Mention of an interview with Philip Goff Nick Chater and Martin Christianson's book ("The Language Game: How Improvisation Created Language and Changed the World") Peter Singer's work from 1975 on ascribing moral status to conscious beings Demis Hassabis' discussion on the "ladder of creativity" Reference to B.F. Skinner and behaviorism | |||
01 Nov 2020 | AI Alignment & AGI Fire Alarm - Connor Leahy | 02:04:35 | |
This week Dr. Tim Scarfe, Alex Stenlake and Yannic Kilcher speak with AGI and AI alignment specialist Connor Leahy a machine learning engineer from Aleph Alpha and founder of EleutherAI. Connor believes that AI alignment is philosophy with a deadline and that we are on the precipice, the stakes are astronomical. AI is important, and it will go wrong by default. Connor thinks that the singularity or intelligence explosion is near. Connor says that AGI is like climate change but worse, even harder problems, even shorter deadline and even worse consequences for the future. These problems are hard, and nobody knows what to do about them. 00:00:00 Introduction to AI alignment and AGI fire alarm 00:15:16 Main Show Intro 00:18:38 Different schools of thought on AI safety 00:24:03 What is intelligence? 00:25:48 AI Alignment 00:27:39 Humans dont have a coherent utility function 00:28:13 Newcomb's paradox and advanced decision problems 00:34:01 Incentives and behavioural economics 00:37:19 Prisoner's dilemma 00:40:24 Ayn Rand and game theory in politics and business 00:44:04 Instrumental convergence and orthogonality thesis 00:46:14 Utility functions and the Stop button problem 00:55:24 AI corrigibality - self alignment 00:56:16 Decision theory and stability / wireheading / robust delegation 00:59:30 Stop button problem 01:00:40 Making the world a better place 01:03:43 Is intelligence a search problem? 01:04:39 Mesa optimisation / humans are misaligned AI 01:06:04 Inner vs outer alignment / faulty reward functions 01:07:31 Large corporations are intelligent and have no stop function 01:10:21 Dutch booking / what is rationality / decision theory 01:16:32 Understanding very powerful AIs 01:18:03 Kolmogorov complexity 01:19:52 GPT-3 - is it intelligent, are humans even intelligent? 01:28:40 Scaling hypothesis 01:29:30 Connor thought DL was dead in 2017 01:37:54 Why is GPT-3 as intelligent as a human 01:44:43 Jeff Hawkins on intelligence as compression and the great lookup table 01:50:28 AI ethics related to AI alignment? 01:53:26 Interpretability 01:56:27 Regulation 01:57:54 Intelligence explosion Discord: https://discord.com/invite/vtRgjbM EleutherAI: https://www.eleuther.ai Twitter: https://twitter.com/npcollapse LinkedIn: https://www.linkedin.com/in/connor-j-leahy/ | |||
07 Jan 2024 | DOES AI HAVE AGENCY? With Professor. Karl Friston and Riddhi J. Pitliya | 01:02:39 | |
Watch behind the scenes, get early access and join the private Discord by supporting us on Patreon: https://patreon.com/mlst (public discord) https://discord.gg/aNPkGUQtc5 https://twitter.com/MLStreetTalk DOES AI HAVE AGENCY? With Professor. Karl Friston and Riddhi J. Pitliya Agency in the context of cognitive science, particularly when considering the free energy principle, extends beyond just human decision-making and autonomy. It encompasses a broader understanding of how all living systems, including non-human entities, interact with their environment to maintain their existence by minimising sensory surprise. According to the free energy principle, living organisms strive to minimize the difference between their predicted states and the actual sensory inputs they receive. This principle suggests that agency arises as a natural consequence of this process, particularly when organisms appear to plan ahead many steps in the future. Riddhi J. Pitliya is based in the computational psychopathology lab doing her Ph.D at the University of Oxford and works with Professor Karl Friston at VERSES. https://twitter.com/RiddhiJP References: THE FREE ENERGY PRINCIPLE—A PRECIS [Ramstead] https://www.dialecticalsystems.eu/contributions/the-free-energy-principle-a-precis/ Active Inference: The Free Energy Principle in Mind, Brain, and Behavior [Thomas Parr, Giovanni Pezzulo, Karl J. Friston] https://direct.mit.edu/books/oa-monograph/5299/Active-InferenceThe-Free-Energy-Principle-in-Mind The beauty of collective intelligence, explained by a developmental biologist | Michael Levin https://www.youtube.com/watch?v=U93x9AWeuOA Growing Neural Cellular Automata https://distill.pub/2020/growing-ca Carcinisation https://en.wikipedia.org/wiki/Carcinisation Prof. KENNETH STANLEY - Why Greatness Cannot Be Planned https://www.youtube.com/watch?v=lhYGXYeMq_E On Defining Artificial Intelligence [Pei Wang] https://sciendo.com/article/10.2478/jagi-2019-0002 Why? The Purpose of the Universe [Goff] https://amzn.to/4aEqpfm Umwelt https://en.wikipedia.org/wiki/Umwelt An Immense World: How Animal Senses Reveal the Hidden Realms [Yong] https://amzn.to/3tzzTb7 What's it like to be a bat [Nagal] https://www.sas.upenn.edu/~cavitch/pdf-library/Nagel_Bat.pdf COUNTERFEIT PEOPLE. DANIEL DENNETT. (SPECIAL EDITION) https://www.youtube.com/watch?v=axJtywd9Tbo We live in the infosphere [FLORIDI] https://www.youtube.com/watch?v=YLNGvvgq3eg Mark Zuckerberg: First Interview in the Metaverse | Lex Fridman Podcast #398 https://www.youtube.com/watch?v=MVYrJJNdrEg Black Mirror: Rachel, Jack and Ashley Too | Official Trailer | Netflix https://www.youtube.com/watch?v=-qIlCo9yqpY | |||
12 Mar 2025 | Tau Language: The Software Synthesis Future (sponsored) | 01:41:19 | |
This sponsored episode features mathematician Ohad Asor discussing logical approaches to AI, focusing on the limitations of machine learning and introducing the Tau language for software development and blockchain tech. Asor argues that machine learning cannot guarantee correctness. Tau allows logical specification of software requirements, automatically creating provably correct implementations with potential to revolutionize distributed systems. The discussion highlights program synthesis, software updates, and applications in finance and governance.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + RESEARCH:https://www.dropbox.com/scl/fi/t849j6v1juk3gc15g4rsy/TAU.pdf?rlkey=hh11h2mhog3ncdbeapbzpzctc&dl=0Tau:https://tau.net/Tau Language:https://tau.ai/tau-language/Research:https://tau.net/Theories-and-Applications-of-Boolean-Algebras-0.29.pdfTOC:1. Machine Learning Foundations and Limitations [00:00:00] 1.1 Fundamental Limitations of Machine Learning and PAC Learning Theory [00:04:50] 1.2 Transductive Learning and the Three Curses of Machine Learning [00:08:57] 1.3 Language, Reality, and AI System Design [00:12:58] 1.4 Program Synthesis and Formal Verification Approaches2. Logical Programming Architecture [00:31:55] 2.1 Safe AI Development Requirements [00:32:05] 2.2 Self-Referential Language Architecture [00:32:50] 2.3 Boolean Algebra and Logical Foundations [00:37:52] 2.4 SAT Solvers and Complexity Challenges [00:44:30] 2.5 Program Synthesis and Specification [00:47:39] 2.6 Overcoming Tarski's Undefinability with Boolean Algebra [00:56:05] 2.7 Tau Language Implementation and User Control3. Blockchain-Based Software Governance [01:09:10] 3.1 User Control and Software Governance Mechanisms [01:18:27] 3.2 Tau's Blockchain Architecture and Meta-Programming Capabilities [01:21:43] 3.3 Development Status and Token Implementation [01:24:52] 3.4 Consensus Building and Opinion Mapping System [01:35:29] 3.5 Automation and Financial ApplicationsCORE REFS (more in pinned comment):[00:03:45] PAC (Probably Approximately Correct) Learning framework, Leslie Valianthttps://en.wikipedia.org/wiki/Probably_approximately_correct_learning[00:06:10] Boolean Satisfiability Problem (SAT), Varioushttps://en.wikipedia.org/wiki/Boolean_satisfiability_problem[00:13:55] Knowledge as Justified True Belief (JTB), Matthias Steuphttps://plato.stanford.edu/entries/epistemology/[00:17:50] Wittgenstein's concept of the limits of language, Ludwig Wittgensteinhttps://plato.stanford.edu/entries/wittgenstein/[00:21:25] Boolean algebras, Ohad Osorhttps://tau.net/tau-language-research/[00:26:10] The Halting Problemhttps://plato.stanford.edu/entries/turing-machine/#HaltProb[00:30:25] Alfred Tarski (1901-1983), Mario Gómez-Torrentehttps://plato.stanford.edu/entries/tarski/[00:41:50] DPLLhttps://www.cs.princeton.edu/~zkincaid/courses/fall18/readings/SATHandbook-CDCL.pdf[00:49:50] Tarski's undefinability theorem (1936), Alfred Tarskihttps://plato.stanford.edu/entries/tarski-truth/[00:51:45] Boolean Algebra mathematical foundations, J. Donald Monkhttps://plato.stanford.edu/entries/boolalg-math/[01:02:35] Belief Revision Theory and AGM Postulates, Sven Ove Hanssonhttps://plato.stanford.edu/entries/logic-belief-revision/[01:05:35] Quantifier elimination in atomless boolean algebra, H. Jerome Keislerhttps://people.math.wisc.edu/~hkeisler/random.pdf[01:08:35] Quantifier elimination in Tau language specification, Ohad Asorhttps://tau.ai/Theories-and-Applications-of-Boolean-Algebras-0.29.pdf[01:11:50] Tau Net blockchain platformhttps://tau.net/[01:19:20] Tau blockchain's innovative approach treating blockchain code itself as a contracthttps://tau.net/Whitepaper.pdf | |||
19 Mar 2022 | #70 - LETITIA PARCALABESCU - Symbolics, Linguistics [UNPLUGGED] | 01:18:30 | |
Today we are having a discussion with Letitia Parcalabescu from the AI Coffee Break youtube channel! We discuss linguistics, symbolic AI and our respective Youtube channels. Make sure you subscribe to her channel! In the first 15 minutes Tim dissects the recent article from Gary Marcus "Deep learning has hit a wall". Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB YT: https://youtu.be/p2D2duT-R2E [00:00:00] Comments on Gary Marcus Article / Symbolic AI [00:14:57] Greetings [00:17:40] Introduction [00:18:48] A shared journey towards computation [00:22:10] A linguistics outsider [00:24:11] Is computational linguistics AI? [00:28:23] swinging pendulums of dogma and resource allocation [00:31:16] the road less travelled [00:34:35] pitching grants with multimodality ... and then the truth [00:40:50] some aspects of language are statistically learnable [00:44:58] ... and some aspects of language are dimensionally cursed [00:48:24] it's good to have both approaches to machine intelligence [00:51:14] the world runs on symbols [00:54:28] there is much more to learn biology [00:59:26] Letitia's creation process [01:02:23] don't overfit content, instead publish and iterate [01:07:48] merging the big picture arrow from the small direction arrows [01:11:02] use passion to drive through failure to success [01:12:56] stay positive [01:16:02] closing remarks | |||
11 Feb 2023 | #102 - Prof. MICHAEL LEVIN, Prof. IRINA RISH - Emergence, Intelligence, Transhumanism | 00:55:17 | |
Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5 YT: https://youtu.be/Vbi288CKgis Michael Levin is a Distinguished Professor in the Biology department at Tufts University, and the holder of the Vannevar Bush endowed Chair. He is the Director of the Allen Discovery Center at Tufts and the Tufts Center for Regenerative and Developmental Biology. His research focuses on understanding the biophysical mechanisms of pattern regulation and harnessing endogenous bioelectric dynamics for rational control of growth and form. The capacity to generate a complex, behaving organism from the single cell of a fertilized egg is one of the most amazing aspects of biology. Levin' lab integrates approaches from developmental biology, computer science, and cognitive science to investigate the emergence of form and function. Using biophysical and computational modeling approaches, they seek to understand the collective intelligence of cells, as they navigate physiological, transcriptional, morphognetic, and behavioral spaces. They develop conceptual frameworks for basal cognition and diverse intelligence, including synthetic organisms and AI. Also joining us this evening is Irina Rish. Irina is a Full Professor at the Université de Montréal's Computer Science and Operations Research department, a core member of Mila - Quebec AI Institute, as well as the holder of the Canada CIFAR AI Chair and the Canadian Excellence Research Chair in Autonomous AI. She has a PhD in AI from UC Irvine. Her research focuses on machine learning, neural data analysis, neuroscience-inspired AI, continual lifelong learning, optimization algorithms, sparse modelling, probabilistic inference, dialog generation, biologically plausible reinforcement learning, and dynamical systems approaches to brain imaging analysis. Interviewer: Dr. Tim Scarfe TOC: [00:00:00] Introduction [00:02:09] Emergence [00:13:16] Scaling Laws [00:23:12] Intelligence [00:44:36] Transhumanism Prof. Michael Levin https://en.wikipedia.org/wiki/Michael_Levin_(biologist) https://www.drmichaellevin.org/ https://twitter.com/drmichaellevin Prof. Irina Rish https://twitter.com/irinarish https://irina-rish.com/ | |||
13 Dec 2020 | #033 Prof. Karl Friston - The Free Energy Principle | 01:51:24 | |
This week Dr. Tim Scarfe, Dr. Keith Duggar and Connor Leahy chat with Prof. Karl Friston. Professor Friston is a British neuroscientist at University College London and an authority on brain imaging. In 2016 he was ranked the most influential neuroscientist on Semantic Scholar. His main contribution to theoretical neurobiology is the variational Free energy principle, also known as active inference in the Bayesian brain. The FEP is a formal statement that the existential imperative for any system which survives in the changing world can be cast as an inference problem. Bayesian Brain Hypothesis states that the brain is confronted with ambiguous sensory evidence, which it interprets by making inferences about the hidden states which caused the sensory data. So is the brain an inference engine? The key concept separating Friston's idea from traditional stochastic reinforcement learning methods and even Bayesian reinforcement learning is moving away from goal-directed optimisation. Remember to subscribe! Enjoy the show! 00:00:00 Show teaser intro 00:16:24 Main formalism for FEP 00:28:29 Path Integral 00:30:52 How did we feel talking to friston? 00:34:06 Skit - on cultures (checked, but maybe make shorter) 00:36:02 Friston joins 00:36:33 Main show introduction 00:40:51 Is prediction all it takes for intelligence? 00:48:21 balancing accuracy with flexibility 00:57:36 belief-free vs belief-based; beliefs are crucial 01:04:53 Fuzzy Markov Blankets and Wandering Sets 01:12:37 The Free Energy Principle conforms to itself 01:14:50 useful false beliefs 01:19:14 complexity minimization is the heart of free energy [01:19:14 ]Keith: 01:23:25 An Alpha to tip the scales? Absoute not! Absolutely yes! 01:28:47 FEP applied to brain anatomy 01:36:28 Are there multiple non-FEP forms in the brain? 01:43:11 a positive conneciton to backpropagation 01:47:12 The FEP does not explain the origin of FEP systems 01:49:32 Post-show banter https://www.fil.ion.ucl.ac.uk/~karl/ #machinelearning | |||
21 Jun 2021 | #55 Self-Supervised Vision Models (Dr. Ishan Misra - FAIR). | 01:36:21 | |
Dr. Ishan Misra is a Research Scientist at Facebook AI Research where he works on Computer Vision and Machine Learning. His main research interest is reducing the need for human supervision, and indeed, human knowledge in visual learning systems. He finished his PhD at the Robotics Institute at Carnegie Mellon. He has done stints at Microsoft Research, INRIA and Yale. His bachelors is in computer science where he achieved the highest GPA in his cohort. Ishan is fast becoming a prolific scientist, already with more than 3000 citations under his belt and co-authoring with Yann LeCun; the godfather of deep learning. Today though we will be focusing an exciting cluster of recent papers around unsupervised representation learning for computer vision released from FAIR. These are; DINO: Emerging Properties in Self-Supervised Vision Transformers, BARLOW TWINS: Self-Supervised Learning via Redundancy Reduction and PAWS: Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples. All of these papers are hot off the press, just being officially released in the last month or so. Many of you will remember PIRL: Self-Supervised Learning of Pretext-Invariant Representations which Ishan was the primary author of in 2019. References; Shuffle and Learn - https://arxiv.org/abs/1603.08561 DepthContrast - https://arxiv.org/abs/2101.02691 DINO - https://arxiv.org/abs/2104.14294 Barlow Twins - https://arxiv.org/abs/2103.03230 SwAV - https://arxiv.org/abs/2006.09882 PIRL - https://arxiv.org/abs/1912.01991 AVID - https://arxiv.org/abs/2004.12943 (best paper candidate at CVPR'21 (just announced over the weekend) - http://cvpr2021.thecvf.com/node/290)
Alexei (Alyosha) Efros http://people.eecs.berkeley.edu/~efros/ http://www.cs.cmu.edu/~tmalisie/projects/nips09/
Exemplar networks https://arxiv.org/abs/1406.6909
The bitter lesson - Rich Sutton http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Machine Teaching: A New Paradigm for Building Machine Learning Systems https://arxiv.org/abs/1707.06742
POET https://arxiv.org/pdf/1901.01753.pdf | |||
16 Jul 2023 | Dr. MAXWELL RAMSTEAD - The Physics of Survival | 02:05:50 | |
Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB Join us for a fascinating discussion of the free energy principle with Dr. Maxwell Ramsted, a leading thinker exploring the intersection of math, physics, and philosophy and Director of Research at VERSES. The FEP was proposed by renowned neuroscientist Karl Friston, this principle offers a unifying theory explaining how systems maintain order and their identity. The free energy principle inverts traditional survival logic. Rather than asking what behaviors promote survival, it queries - given things exist, what must they do? The answer: minimizing free energy, or "surprise." Systems persist by constantly ensuring their internal states match anticipated states based on a model of the world. Failure to minimize surprise leads to chaos as systems dissolve into disorder. Thus, the free energy principle elucidates why lifeforms relentlessly model and predict their surroundings. It is an existential imperative counterbalancing entropy. Essentially, this principle describes the mind's pursuit of harmony between expectations and reality. Its relevance spans from cells to societies, underlying order wherever longevity is found. Our discussion explores the technical details and philosophical implications of this paradigm-shifting theory. How does it further our understanding of cognition and intelligence? What insights does it offer about the fundamental patterns and properties of existence? Can it precipitate breakthroughs in disciplines like neuroscience and artificial intelligence? Dr. Ramstead completed his Ph.D. at McGill University in Montreal, Canada in 2019, with frequent research visits to UCL in London, under the supervision of the world’s most cited neuroscientist, Professor Karl Friston (UCL). YT version: https://youtu.be/8qb28P7ksyE https://scholar.google.ca/citations?user=ILpGOMkAAAAJ&hl=frhttps://spatialwebfoundation.org/team/maxwell-ramstead/https://www.linkedin.com/in/maxwell-ramstead-43a1991b7/https://twitter.com/mjdramstead VERSES AI: https://www.verses.ai/ Intro: Tim Scarfe (Ph.D) Interviewer: Keith Duggar (Ph.D MIT) TOC: 0:00:00 - Tim Intro 0:08:10 - Intro and philosophy 0:14:26 - Intro to Maxwell 0:18:00 - FEP 0:29:08 - Markov Blankets 0:51:15 - Verses AI / Applications of FEP 1:05:55 - Potential issues with deploying FEP 1:10:50 - Shared knowledge graphs 1:14:29 - XRisk / Ethics 1:24:57 - Strength of Verses 1:28:30 - Misconceptions about FEP, Physics vs philosophy/criticism 1:44:41 - Emergence / consciousness References: Principia Mathematica https://www.abebooks.co.uk/servlet/BookDetailsPL?bi=30567249049 Andy Clark's paper "Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science" (Behavioral and Brain Sciences, 2013) https://pubmed.ncbi.nlm.nih.gov/23663408/ "Math Does Not Represent" by Erik Curiel https://www.youtube.com/watch?v=aA_T20HAzyY A free energy principle for generic quantum systems (Chris Fields et al) https://arxiv.org/pdf/2112.15242.pdf Designing explainable artificial intelligence with active inference https://arxiv.org/abs/2306.04025 Am I Self-Conscious? (Friston) https://www.frontiersin.org/articles/10.3389/fpsyg.2018.00579/full The Meta-Problem of Consciousness https://philarchive.org/archive/CHATMO-32v1 The Map-Territory Fallacy Fallacy https://arxiv.org/abs/2208.06924 A Technical Critique of Some Parts of the Free Energy Principle - Martin Biehl et al https://arxiv.org/abs/2001.06408 WEAK MARKOV BLANKETS IN HIGH-DIMENSIONAL, SPARSELY-COUPLED RANDOM DYNAMICAL SYSTEMS - DALTON A R SAKTHIVADIVEL https://arxiv.org/pdf/2207.07620.pdf | |||
11 May 2023 | Future of Generative AI [David Foster] | 02:31:36 | |
Generative Deep Learning, 2nd Edition [David Foster] https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/ Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5 Twitter: https://twitter.com/MLStreetTalk In this conversation, Tim Scarfe and David Foster, the author of 'Generative Deep Learning,' dive deep into the world of generative AI, discussing topics ranging from model families and auto regressive models to the democratization of AI technology and its potential impact on various industries. They explore the connection between language and true intelligence, as well as the limitations of GPT and other large language models. The discussion also covers the importance of task-independent world models, the concept of active inference, and the potential of combining these ideas with transformer and GPT-style models. Ethics and regulation in AI development are also discussed, including the need for transparency in data used to train AI models and the responsibility of developers to ensure their creations are not destructive. The conversation touches on the challenges posed by AI-generated content on copyright laws and the diminishing role of effort and skill in copyright due to generative models. The impact of AI on education and creativity is another key area of discussion, with Tim and David exploring the potential benefits and drawbacks of using AI in the classroom, the need for a balance between traditional learning methods and AI-assisted learning, and the importance of teaching students to use AI tools critically and responsibly. Generative AI in music is also explored, with David and Tim discussing the potential for AI-generated music to change the way we create and consume art, as well as the challenges in training AI models to generate music that captures human emotions and experiences. Throughout the conversation, Tim and David touch on the potential risks and consequences of AI becoming too powerful, the importance of maintaining control over the technology, and the possibility of government intervention and regulation. The discussion concludes with a thought experiment about AI predicting human actions and creating transient capabilities that could lead to doom. TOC: Introducing Generative Deep Learning [00:00:00] Model Families in Generative Modeling [00:02:25] Auto Regressive Models and Recurrence [00:06:26] Language and True Intelligence [00:15:07] Language, Reality, and World Models [00:19:10] AI, Human Experience, and Understanding [00:23:09] GPTs Limitations and World Modeling [00:27:52] Task-Independent Modeling and Cybernetic Loop [00:33:55] Collective Intelligence and Emergence [00:36:01] Active Inference vs. Reinforcement Learning [00:38:02] Combining Active Inference with Transformers [00:41:55] Decentralized AI and Collective Intelligence [00:47:46] Regulation and Ethics in AI Development [00:53:59] AI-Generated Content and Copyright Laws [00:57:06] Effort, Skill, and AI Models in Copyright [00:57:59] AI Alignment and Scale of AI Models [00:59:51] Democratization of AI: GPT-3 and GPT-4 [01:03:20] Context Window Size and Vector Databases [01:10:31] Attention Mechanisms and Hierarchies [01:15:04] Benefits and Limitations of Language Models [01:16:04] AI in Education: Risks and Benefits [01:19:41] AI Tools and Critical Thinking in the Classroom [01:29:26] Impact of Language Models on Assessment and Creativity [01:35:09] Generative AI in Music and Creative Arts [01:47:55] Challenges and Opportunities in Generative Music [01:52:11] AI-Generated Music and Human Emotions [01:54:31] Language Modeling vs. Music Modeling [02:01:58] Democratization of AI and Industry Impact [02:07:38] Recursive Self-Improving Superintelligence [02:12:48] AI Technologies: Positive and Negative Impacts [02:14:44] Runaway AGI and Control Over AI [02:20:35] AI Dangers, Cybercrime, and Ethics [02:23:42] | |||
24 Sep 2024 | Taming Silicon Valley - Prof. Gary Marcus | 01:56:55 | |
AI expert Prof. Gary Marcus doesn't mince words about today's artificial intelligence. He argues that despite the buzz, chatbots like ChatGPT aren't as smart as they seem and could cause real problems if we're not careful. Marcus is worried about tech companies putting profits before people. He thinks AI could make fake news and privacy issues even worse. He's also concerned that a few big tech companies have too much power. Looking ahead, Marcus believes the AI hype will die down as reality sets in. He wants to see AI developed in smarter, more responsible ways. His message to the public? We need to speak up and demand better AI before it's too late. Buy Taming Silicon Valley: https://amzn.to/3XTlC5s Gary Marcus: https://garymarcus.substack.com/ https://x.com/GaryMarcus Interviewer: Dr. Tim Scarfe (Refs in top comment) TOC [00:00:00] AI Flaws, Improvements & Industry Critique [00:16:29] AI Safety Theater & Image Generation Issues [00:23:49] AI's Lack of World Models & Human-like Understanding [00:31:09] LLMs: Superficial Intelligence vs. True Reasoning [00:34:45] AI in Specialized Domains: Chess, Coding & Limitations [00:42:10] AI-Generated Code: Capabilities & Human-AI Interaction [00:48:10] AI Regulation: Industry Resistance & Oversight Challenges [00:54:55] Copyright Issues in AI & Tech Business Models [00:57:26] AI's Societal Impact: Risks, Misinformation & Ethics [01:23:14] AI X-risk, Alignment & Moral Principles Implementation [01:37:10] Persistent AI Flaws: System Limitations & Architecture Challenges [01:44:33] AI Future: Surveillance Concerns, Economic Challenges & Neuro-Symbolic AI YT version with refs: https://youtu.be/o9MfuUoGlSw | |||
22 Sep 2020 | Computation, Bayesian Model Selection, Interactive Articles | 01:13:40 | |
This week Dr. Keith Duggar, Alex Stenlake and Dr. Tim Scarfe discuss the theory of computation, intelligence, Bayesian model selection, the intelligence explosion and the the phenomenon of "interactive articles". 00:00:00 Intro 00:01:27 Kernels and context-free grammars 00:06:04 Theory of computation 00:18:41 Intelligence 00:22:03 Bayesian model selection 00:44:05 AI-IQ Measure / Intelligence explosion 00:52:09 Interactive articles 01:12:32 Outro | |||
23 Oct 2024 | Speechmatics CTO - Next-Generation Speech Recognition | 01:46:23 | |
Will Williams is CTO of Speechmatics in Cambridge. In this sponsored episode - he shares deep technical insights into modern speech recognition technology and system architecture. The episode covers several key technical areas: * Speechmatics' hybrid approach to ASR, which focusses on unsupervised learning methods, achieving comparable results with 100x less data than fully supervised approaches. Williams explains why this is more efficient and generalizable than end-to-end models like Whisper. * Their production architecture implementing multiple operating points for different latency-accuracy trade-offs, with careful latency padding (up to 1.8 seconds) to ensure consistent user experience. The system uses lattice-based decoding with language model integration for improved accuracy. * The challenges and solutions in real-time ASR, including their approach to diarization (speaker identification), handling cross-talk, and implicit source separation. Williams explains why these problems remain difficult even with modern deep learning approaches. * Their testing and deployment infrastructure, including the use of mirrored environments for catching edge cases in production, and their strategy of maintaining global models rather than allowing customer-specific fine-tuning. * Technical evolution in ASR, from early days of custom CUDA kernels and manual memory management to modern frameworks, with Williams offering interesting critiques of current PyTorch memory management approaches and arguing for more efficient direct memory allocation in production systems. Get coding with their API! This is their URL: https://www.speechmatics.com/ DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)? MLST is sponsored by Tufa Labs: Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more. Interested? Apply for an ML research position: benjamin@tufa.ai TOC 1. ASR Core Technology & Real-time Architecture [00:00:00] 1.1 ASR and Diarization Fundamentals [00:05:25] 1.2 Real-time Conversational AI Architecture [00:09:21] 1.3 Neural Network Streaming Implementation [00:12:49] 1.4 Multi-modal System Integration 2. Production System Optimization [00:29:38] 2.1 Production Deployment and Testing Infrastructure [00:35:40] 2.2 Model Architecture and Deployment Strategy [00:37:12] 2.3 Latency-Accuracy Trade-offs [00:39:15] 2.4 Language Model Integration [00:40:32] 2.5 Lattice-based Decoding Architecture 3. Performance Evaluation & Ethical Considerations [00:44:00] 3.1 ASR Performance Metrics and Capabilities [00:46:35] 3.2 AI Regulation and Evaluation Methods [00:51:09] 3.3 Benchmark and Testing Challenges [00:54:30] 3.4 Real-world Implementation Metrics [01:00:51] 3.5 Ethics and Privacy Considerations 4. ASR Technical Evolution [01:09:00] 4.1 WER Calculation and Evaluation Methodologies [01:10:21] 4.2 Supervised vs Self-Supervised Learning Approaches [01:21:02] 4.3 Temporal Learning and Feature Processing [01:24:45] 4.4 Feature Engineering to Automated ML 5. Enterprise Implementation & Scale [01:27:55] 5.1 Future AI Systems and Adaptation [01:31:52] 5.2 Technical Foundations and History [01:34:53] 5.3 Infrastructure and Team Scaling [01:38:05] 5.4 Research and Talent Strategy [01:41:11] 5.5 Engineering Practice Evolution Shownotes: https://www.dropbox.com/scl/fi/d94b1jcgph9o8au8shdym/Speechmatics.pdf?rlkey=bi55wvktzomzx0y5sic6jz99y&st=6qwofv8t&dl=0 | |||
25 May 2020 | Harri Valpola: System 2 AI and Planning in Model-Based Reinforcement Learning | 01:38:16 | |
In this episode of Machine Learning Street Talk, Tim Scarfe, Yannic Kilcher and Connor Shorten interviewed Harri Valpola, CEO and Founder of Curious AI. We continued our discussion of System 1 and System 2 thinking in Deep Learning, as well as miscellaneous topics around Model-based Reinforcement Learning. Dr. Valpola describes some of the challenges of modelling industrial control processes such as water sewage filters and paper mills with the use of model-based RL. Dr. Valpola and his collaborators recently published “Regularizing Trajectory Optimization with Denoising Autoencoders” that addresses some of the concerns of planning algorithms that exploit inaccuracies in their world models! 00:00:00 Intro to Harri and Curious AI System1/System 2 00:04:50 Background on model-based RL challenges from Tim 00:06:26 Other interesting research papers on model-based RL from Connor 00:08:36 Intro to Curious AI recent NeurIPS paper on model-based RL and denoising autoencoders from Yannic 00:21:00 Main show kick off, system 1/2 00:31:50 Where does the simulator come from? 00:33:59 Evolutionary priors 00:37:17 Consciousness 00:40:37 How does one build a company like Curious AI? 00:46:42 Deep Q Networks 00:49:04 Planning and Model based RL 00:53:04 Learning good representations 00:55:55 Typical problem Curious AI might solve in industry 01:00:56 Exploration 01:08:00 Their paper - regularizing trajectory optimization with denoising 01:13:47 What is Epistemic uncertainty 01:16:44 How would Curious develop these models 01:18:00 Explainability and simulations 01:22:33 How system 2 works in humans 01:26:11 Planning 01:27:04 Advice for starting an AI company 01:31:31 Real world implementation of planning models 01:33:49 Publishing research and openness We really hope you enjoy this episode, please subscribe! Regularizing Trajectory Optimization with Denoising Autoencoders: https://papers.nips.cc/paper/8552-regularizing-trajectory-optimization-with-denoising-autoencoders.pdf Pulp, Paper & Packaging: A Future Transformed through Deep Learning: https://thecuriousaicompany.com/pulp-paper-packaging-a-future-transformed-through-deep-learning/ Curious AI: https://thecuriousaicompany.com/ Harri Valpola Publications: https://scholar.google.com/citations?user=1uT7-84AAAAJ&hl=en&oi=ao Some interesting papers around Model-Based RL: GameGAN: https://cdn.arstechnica.net/wp-content/uploads/2020/05/Nvidia_GameGAN_Research.pdf Plan2Explore: https://ramanans1.github.io/plan2explore/ World Models: https://worldmodels.github.io/ MuZero: https://arxiv.org/pdf/1911.08265.pdf PlaNet: A Deep Planning Network for RL: https://ai.googleblog.com/2019/02/introducing-planet-deep-planning.html Dreamer: Scalable RL using World Models: https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html Model Based RL for Atari: https://arxiv.org/pdf/1903.00374.pdf | |||
03 Sep 2021 | #59 - Jeff Hawkins (Thousand Brains Theory) | 02:34:51 | |
Patreon: https://www.patreon.com/mlst The ultimate goal of neuroscience is to learn how the human brain gives rise to human intelligence and what it means to be intelligent. Understanding how the brain works is considered one of humanity’s greatest challenges. Jeff Hawkins thinks that the reality we perceive is a kind of simulation, a hallucination, a confabulation. He thinks that our brains are a model reality based on thousands of information streams originating from the sensors in our body. Critically - Hawkins doesn’t think there is just one model but rather; thousands. Jeff has just released his new book, A thousand brains: a new theory of intelligence. It’s an inspiring and well-written book and I hope after watching this show; you will be inspired to read it too. https://numenta.com/a-thousand-brains-by-jeff-hawkins/ https://numenta.com/blog/2019/01/16/the-thousand-brains-theory-of-intelligence/ Panel: Dr. Keith Duggar https://twitter.com/DoctorDuggar Connor Leahy https://twitter.com/npcollapse | |||
25 Jan 2025 | Nicholas Carlini (Google DeepMind) | 01:21:15 | |
Nicholas Carlini from Google DeepMind offers his view of AI security, emergent LLM capabilities, and his groundbreaking model-stealing research. He reveals how LLMs can unexpectedly excel at tasks like chess and discusses the security pitfalls of LLM-generated code. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** Transcript: https://www.dropbox.com/scl/fi/lat7sfyd4k3g5k9crjpbf/CARLINI.pdf?rlkey=b7kcqbvau17uw6rksbr8ccd8v&dl=0 TOC: 1. ML Security Fundamentals [00:00:00] 1.1 ML Model Reasoning and Security Fundamentals [00:03:04] 1.2 ML Security Vulnerabilities and System Design [00:08:22] 1.3 LLM Chess Capabilities and Emergent Behavior [00:13:20] 1.4 Model Training, RLHF, and Calibration Effects 2. Model Evaluation and Research Methods [00:19:40] 2.1 Model Reasoning and Evaluation Metrics [00:24:37] 2.2 Security Research Philosophy and Methodology [00:27:50] 2.3 Security Disclosure Norms and Community Differences 3. LLM Applications and Best Practices [00:44:29] 3.1 Practical LLM Applications and Productivity Gains [00:49:51] 3.2 Effective LLM Usage and Prompting Strategies [00:53:03] 3.3 Security Vulnerabilities in LLM-Generated Code 4. Advanced LLM Research and Architecture [00:59:13] 4.1 LLM Code Generation Performance and O(1) Labs Experience [01:03:31] 4.2 Adaptation Patterns and Benchmarking Challenges [01:10:10] 4.3 Model Stealing Research and Production LLM Architecture Extraction REFS: [00:01:15] Nicholas Carlini’s personal website & research profile (Google DeepMind, ML security) - https://nicholas.carlini.com/ [00:01:50] CentML AI compute platform for language model workloads - https://centml.ai/ [00:04:30] Seminal paper on neural network robustness against adversarial examples (Carlini & Wagner, 2016) - https://arxiv.org/abs/1608.04644 [00:05:20] Computer Fraud and Abuse Act (CFAA) – primary U.S. federal law on computer hacking liability - https://www.justice.gov/jm/jm-9-48000-computer-fraud [00:08:30] Blog post: Emergent chess capabilities in GPT-3.5-turbo-instruct (Nicholas Carlini, Sept 2023) - https://nicholas.carlini.com/writing/2023/chess-llm.html [00:16:10] Paper: “Self-Play Preference Optimization for Language Model Alignment” (Yue Wu et al., 2024) - https://arxiv.org/abs/2405.00675 [00:18:00] GPT-4 Technical Report: development, capabilities, and calibration analysis - https://arxiv.org/abs/2303.08774 [00:22:40] Historical shift from descriptive to algebraic chess notation (FIDE) - https://en.wikipedia.org/wiki/Descriptive_notation [00:23:55] Analysis of distribution shift in ML (Hendrycks et al.) - https://arxiv.org/abs/2006.16241 [00:27:40] Nicholas Carlini’s essay “Why I Attack” (June 2024) – motivations for security research - https://nicholas.carlini.com/writing/2024/why-i-attack.html [00:34:05] Google Project Zero’s 90-day vulnerability disclosure policy - https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html [00:51:15] Evolution of Google search syntax & user behavior (Daniel M. Russell) - https://www.amazon.com/Joy-Search-Google-Master-Information/dp/0262042878 [01:04:05] Rust’s ownership & borrowing system for memory safety - https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [01:10:05] Paper: “Stealing Part of a Production Language Model” (Carlini et al., March 2024) – extraction attacks on ChatGPT, PaLM-2 - https://arxiv.org/abs/2403.06634 [01:10:55] First model stealing paper (Tramèr et al., 2016) – attacking ML APIs via prediction - https://arxiv.org/abs/1609.02943 | |||
24 Feb 2022 | #64 Prof. Gary Marcus 3.0 | 00:51:47 | |
Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/HNnAwSduud YT: https://www.youtube.com/watch?v=ZDY2nhkPZxw We have a chat with Prof. Gary Marcus about everything which is currently top of mind for him, consciousness [00:00:00] Gary intro [00:01:25] Slightly conscious [00:24:59] Abstract, compositional models [00:32:46] Spline theory of NNs [00:36:17] Self driving cars / algebraic reasoning [00:39:43] Extrapolation [00:44:15] Scaling laws [00:49:50] Maximum likelihood estimation References: Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets https://arxiv.org/abs/2201.02177 DEEP DOUBLE DESCENT: WHERE BIGGER MODELS AND MORE DATA HURT https://arxiv.org/pdf/1912.02292.pdf Bayesian Deep Learning and a Probabilistic Perspective of Generalization https://arxiv.org/pdf/2002.08791.pdf | |||
26 Dec 2022 | #95 - Prof. IRINA RISH - AGI, Complex Systems, Transhumanism | 00:39:12 | |
Canadian Excellence Research Chair in Autonomous AI. Irina holds an MSc and PhD in AI from the University of California, Irvine as well as an MSc in Applied Mathematics from the Moscow Gubkin Institute. Her research focuses on machine learning, neural data analysis, and neuroscience-inspired AI. In particular, she is exploring continual lifelong learning, optimization algorithms for deep neural networks, sparse modelling and probabilistic inference, dialog generation, biologically plausible reinforcement learning, and dynamical systems approaches to brain imaging analysis. Prof. Rish holds 64 patents, has published over 80 research papers, several book chapters, three edited books, and a monograph on Sparse Modelling. She has served as a Senior Area Chair for NeurIPS and ICML. Irina's research is focussed on taking us closer to the holy grail of Artificial General Intelligence. She continues to push the boundaries of machine learning, continually striving to make advancements in neuroscience-inspired AI. In a conversation about artificial intelligence (AI), Irina and Tim discussed the idea of transhumanism and the potential for AI to improve human flourishing. Irina suggested that instead of looking at AI as something to be controlled and regulated, people should view it as a tool to augment human capabilities. She argued that attempting to create an AI that is smarter than humans is not the best approach, and that a hybrid of human and AI intelligence is much more beneficial. As an example, she mentioned how technology can be used as an extension of the human mind, to track mental states and improve self-understanding. Ultimately, Irina concluded that transhumanism is about having a symbiotic relationship with technology, which can have a positive effect on both parties. Tim then discussed the contrasting types of intelligence and how this could lead to something interesting emerging from the combination. He brought up the Trolley Problem and how difficult moral quandaries could be programmed into an AI. Irina then referenced The Garden of Forking Paths, a story which explores the idea of how different paths in life can be taken and how decisions from the past can have an effect on the present. To better understand AI and intelligence, Irina suggested looking at it from multiple perspectives and understanding the importance of complex systems science in programming and understanding dynamical systems. She discussed the work of Michael Levin, who is looking into reprogramming biological computers with chemical interventions, and Tim mentioned Alex Mordvinsev, who is looking into the self-healing and repair of these systems. Ultimately, Irina argued that the key to understanding AI and intelligence is to recognize the complexity of the systems and to create hybrid models of human and AI intelligence. Find Irina; https://mila.quebec/en/person/irina-rish/ https://twitter.com/irinarish YT version: https://youtu.be/8-ilcF0R7mI MLST Discord: https://discord.gg/aNPkGUQtc5 References; The Garden of Forking Paths: Jorge Luis Borges [Jorge Luis Borges] https://www.amazon.co.uk/Garden-Forking-Paths-Penguin-Modern/dp/0241339057 The Brain from Inside Out [György Buzsáki] https://www.amazon.co.uk/Brain-Inside-Out-Gy%C3%B6rgy-Buzs%C3%A1ki/dp/0190905387 Growing Isotropic Neural Cellular Automata [Alexander Mordvintsev] https://arxiv.org/abs/2205.01681 The Extended Mind [Andy Clark and David Chalmers] https://www.jstor.org/stable/3328150 The Gentle Seduction [Marc Stiegler] https://www.amazon.co.uk/Gentle-Seduction-Marc-Stiegler/dp/0671698877 | |||
06 Mar 2021 | #046 The Great ML Stagnation (Mark Saroufim and Dr. Mathew Salvaris) | 01:39:57 | |
Academics think of themselves as trailblazers, explorers — seekers of the truth. Any fundamental discovery involves a significant degree of risk. If an idea is guaranteed to work then it moves from the realm of research to engineering. Unfortunately, this also means that most research careers will invariably be failures at least if failures are measured via “objective” metrics like citations. Today we discuss the recent article from Mark Saroufim called Machine Learning: the great stagnation. We discuss the rise of gentleman scientists, fake rigor, incentives in ML, SOTA-chasing, "graduate student descent", distribution of talent in ML and how to learn effectively. With special guest interviewer Mat Salvaris. Machine learning: the great stagnation [00:00:00] Main show kick off [00:16:30] Great stagnation article / Bad incentive systems in academia [00:18:24] OpenAI is a media business [00:19:48] Incentive structures in academia [00:22:13] SOTA chasing [00:24:47] F You Money [00:28:53] Research grants and gentlemen scientists [00:29:13] Following your own gradient of interest and making a contribution [00:33:27] Marketing yourself to be successful [00:37:07] Tech companies create the bad incentives [00:42:20] GPT3 was sota chasing but it seemed really... "good"? Scaling laws? [00:51:09] Dota / game AI [00:58:39] Hard to go it alone? [01:02:08] Reaching out to people [01:09:21] Willingness to be wrong [01:13:14] Distribution of talent / tech interviews [01:18:30] What should you read online and how to learn? Sharing your stuff online and finding your niece [01:25:52] Mark Saroufim: https://marksaroufim.substack.com/ http://robotoverlordmanual.com/ https://twitter.com/marksaroufim https://www.youtube.com/marksaroufim Dr. Mathew Salvaris: https://www.linkedin.com/in/drmathewsalvaris/ https://twitter.com/MSalvaris | |||
24 Apr 2020 | Exploring Open-Ended Algorithms: POET | 01:12:56 | |
Three YouTubers; Tim Scarfe - Machine Learning Dojo (https://www.youtube.com/channel/UCXvHuBMbgJw67i5vrMBBobA), Connor Shorten - Henry AI Labs (https://www.youtube.com/channel/UCHB9VepY6kYvZjj0Bgxnpbw) and Yannic Kilcher (https://www.youtube.com/channel/UCZHmQk67mSJgfCCTn7xBfew). We made a new YouTube channel called Machine Learning Street Talk. Every week we will talk about the latest and greatest in AI. Subscribe now! Special guests this week; Dr. Mathew Salvaris (https://www.linkedin.com/in/drmathewsalvaris/), Eric Craeymeersch (https://www.linkedin.com/in/ericcraeymeersch/), Dr. Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/), Dmitri Soshnikov (https://www.linkedin.com/in/shwars/) We discuss the new concept of an open-ended, or "AI-Generating" algorithm. Open-endedness is a class of algorithms which generate problems and solutions to increasingly complex and diverse tasks. These algorithms create their own curriculum of learning. Complex tasks become tractable because they are now the final stepping stone in a lineage of progressions. In many respects, it's better to trust the machine to develop the learning curriculum, because the best curriculum might be counter-intuitive. These algorithms can generate a radiating tree of evolving challenges and solutions just like natural evolution. Evolution has produced an eternity of diversity and complexity and even produced human intelligence as a side-effect! Could AI-generating algorithms be the next big thing in machine learning? Wang, Rui, et al. "Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions." arXiv preprint arXiv:2003.08536 (2020). https://arxiv.org/abs/2003.08536 Wang, Rui, et al. "Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions." arXiv preprint arXiv:1901.01753 (2019). https://arxiv.org/abs/1901.01753 Watch Yannic’s video on POET: https://www.youtube.com/watch?v=8wkgDnNxiVs #reinforcementlearning #machinelearning #uber #deeplearning #rl #timscarfe #connorshorten #yannickilcher | |||
23 Dec 2022 | #92 - SARA HOOKER - Fairness, Interpretability, Language Models | 00:51:31 | |
Support us! https://www.patreon.com/mlst Sara Hooker is an exceptionally talented and accomplished leader and research scientist in the field of machine learning. She is the founder of Cohere For AI, a non-profit research lab that seeks to solve complex machine learning problems. She is passionate about creating more points of entry into machine learning research and has dedicated her efforts to understanding how progress in this field can be translated into reliable and accessible machine learning in the real-world. Sara is also the co-founder of the Trustworthy ML Initiative, a forum and seminar series related to Trustworthy ML. She is on the advisory board of Patterns and is an active member of the MLC research group, which has a focus on making participation in machine learning research more accessible. Before starting Cohere For AI, Sara worked as a research scientist at Google Brain. She has written several influential research papers, including "The Hardware Lottery", "The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation", "Moving Beyond “Algorithmic Bias is a Data Problem”" and "Characterizing and Mitigating Bias in Compact Models". In addition to her research work, Sara is also the founder of the local Bay Area non-profit Delta Analytics, which works with non-profits and communities all over the world to build technical capacity and empower others to use data. She regularly gives tutorials on machine learning fundamentals, interpretability, model compression and deep neural networks and is dedicated to collaborating with independent researchers around the world. Sara Hooker is famous for writing a paper introducing the concept of the 'hardware lottery', in which the success of a research idea is determined not by its inherent superiority, but by its compatibility with available software and hardware. She argued that choices about software and hardware have had a substantial impact in deciding the outcomes of early computer science history, and that with the increasing heterogeneity of the hardware landscape, gains from advances in computing may become increasingly disparate. Sara proposed that an interim goal should be to create better feedback mechanisms for researchers to understand how their algorithms interact with the hardware they use. She suggested that domain-specific languages, auto-tuning of algorithmic parameters, and better profiling tools may help to alleviate this issue, as well as provide researchers with more informed opinions about how hardware and software should progress. Ultimately, Sara encouraged researchers to be mindful of the implications of the hardware lottery, as it could mean that progress on some research directions is further obstructed. If you want to learn more about that paper, watch our previous interview with Sara. YT version: https://youtu.be/7oJui4eSCoY MLST Discord: https://discord.gg/aNPkGUQtc5 TOC: [00:00:00] Intro [00:02:53] Interpretability / Fairness [00:35:29] LLMs Find Sara: https://www.sarahooker.me/ https://twitter.com/sarahookr | |||
10 Oct 2024 | Bold AI Predictions From Cohere Co-founder | 00:47:17 | |
Ivan Zhang, co-founder of Cohere, discusses the company's enterprise-focused AI solutions. He explains Cohere's early emphasis on embedding technology and training models for secure environments. Zhang highlights their implementation of Retrieval-Augmented Generation in healthcare, significantly reducing doctor preparation time. He explores the shift from monolithic AI models to heterogeneous systems and the importance of improving various AI system components. Zhang shares insights on using synthetic data to teach models reasoning, the democratization of software development through AI, and how his gaming skills transfer to running an AI company. He advises young developers to fully embrace AI technologies and offers perspectives on AI reliability, potential risks, and future model architectures. https://cohere.com/ https://ivanzhang.ca/ https://x.com/1vnzh TOC: 00:00:00 Intro 00:03:20 AI & Language Model Evolution 00:06:09 Future AI Apps & Development 00:09:29 Impact on Software Dev Practices 00:13:03 Philosophical & Societal Implications 00:16:30 Compute Efficiency & RAG 00:20:39 Adoption Challenges & Solutions 00:22:30 GPU Optimization & Kubernetes Limits 00:24:16 Cohere's Implementation Approach 00:28:13 Gaming's Professional Influence 00:34:45 Transformer Optimizations 00:36:45 Future Models & System-Level Focus 00:39:20 Inference-Time Computation & Reasoning 00:42:05 Capturing Human Thought in AI 00:43:15 Research, Hiring & Developer Advice REFS: 00:02:31 Cohere, https://cohere.com/ 00:02:40 The Transformer architecture, https://arxiv.org/abs/1706.03762 00:03:22 The Innovator's Dilemma, https://www.amazon.com/Innovators-Dilemma-Technologies-Management-Innovation/dp/1633691780 00:09:15 The actor model, https://en.wikipedia.org/wiki/Actor_model 00:14:35 John Searle's Chinese Room Argument, https://plato.stanford.edu/entries/chinese-room/ 00:18:00 Retrieval-Augmented Generation, https://arxiv.org/abs/2005.11401 00:18:40 Retrieval-Augmented Generation, https://docs.cohere.com/v2/docs/retrieval-augmented-generation-rag 00:35:39 Let’s Verify Step by Step, https://arxiv.org/pdf/2305.20050 00:39:20 Adaptive Inference-Time Compute, https://arxiv.org/abs/2410.02725 00:43:20 Ryan Greenblatt ARC entry, https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt Disclaimer: This show is part of our Cohere partnership series | |||
15 Nov 2022 | #80 AIDAN GOMEZ [CEO Cohere] - Language as Software | 00:51:50 | |
We had a conversation with Aidan Gomez, the CEO of language-based AI platform Cohere. Cohere is a startup which uses artificial intelligence to help users build the next generation of language-based applications. It's headquartered in Toronto. The company has raised $175 million in funding so far. Language may well become a key new substrate for software building, both in its representation and how we build the software. It may democratise software building so that more people can build software, and we can build new types of software. Aidan and I discuss this in detail in this episode of MLST. Check out Cohere -- https://dashboard.cohere.ai/welcome/register?utm_source=influencer&utm_medium=social&utm_campaign=mlst Support us! https://www.patreon.com/mlst YT version: https://youtu.be/ooBt_di8DLs TOC: [00:00:00] Aidan Gomez intro [00:02:12] What's it like being a CEO? [00:02:52] Transformers [00:09:33] Deepmind Chomsky Hierarchy [00:14:58] Cohere roadmap [00:18:18] Friction using LLMs for startups [00:25:31] How different from OpenAI / GPT-3 [00:29:31] Engineering questions on Cohere [00:35:13] Francois Chollet says that LLMs are like databases [00:38:34] Next frontier of language models [00:42:04] Different modes of understanding in LLMs [00:47:04] LLMs are the new extended mind [00:50:03] Is language the next interface, and why might that be bad? References: [Balestriero] Spine theory of NNs https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf [Delétang et al] Neural Networks and the Chomsky Hierarchy https://arxiv.org/abs/2207.02098 [Fodor, Pylyshyn] Connectionism and Cognitive Architecture: A Critical Analysis https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/docs/jaf.pdf [Chalmers, Clark] The extended mind https://icds.uoregon.edu/wp-content/uploads/2014/06/Clark-and-Chalmers-The-Extended-Mind.pdf [Melanie Mitchell et al] The Debate Over Understanding in AI's Large Language Models https://arxiv.org/abs/2210.13966 [Jay Alammar] Illustrated stable diffusion https://jalammar.github.io/illustrated-stable-diffusion/ Illustrated transformer https://jalammar.github.io/illustrated-transformer/ https://www.youtube.com/channel/UCmOwsoHty5PrmE-3QhUBfPQ [Sandra Kublik] (works at Cohere!) https://www.youtube.com/channel/UCjG6QzmabZrBEeGh3vi-wDQ | |||
19 Jun 2020 | Francois Chollet - On the Measure of Intelligence | 02:33:31 | |
We cover Francois Chollet's recent paper. Abstract; To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to "buy" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans. | |||
19 May 2020 | ICLR 2020: Yann LeCun and Energy-Based Models | 02:12:11 | |
This week Connor Shorten, Yannic Kilcher and Tim Scarfe reacted to Yann LeCun's keynote speech at this year's ICLR conference which just passed. ICLR is the number two ML conference and was completely open this year, with all the sessions publicly accessible via the internet. Yann spent most of his talk speaking about self-supervised learning, Energy-based models (EBMs) and manifold learning. Don't worry if you hadn't heard of EBMs before, neither had we! Thanks for watching! Please Subscribe! Paper Links: ICLR 2020 Keynote Talk: https://iclr.cc/virtual_2020/speaker_7.html A Tutorial on Energy-Based Learning: http://yann.lecun.com/exdb/publis/pdf/lecun-06.pdf Concept Learning with Energy-Based Models (Yannic's Explanation): https://www.youtube.com/watch?v=Cs_j-oNwGgg Concept Learning with Energy-Based Models (Paper): https://arxiv.org/pdf/1811.02486.pdf Concept Learning with Energy-Based Models (OpenAI Blog Post): https://openai.com/blog/learning-concepts-with-energy-functions/ #deeplearning #machinelearning #iclr #iclr2020 #yannlecun | |||
01 May 2021 | #52 - Unadversarial Examples (Hadi Salman, MIT) | 01:48:16 | |
Performing reliably on unseen or shifting data distributions is a difficult challenge for modern vision systems, even slight corruptions or transformations of images are enough to slash the accuracy of state-of-the-art classifiers. When an adversary is allowed to modify an input image directly, models can be manipulated into predicting anything even when there is no perceptible change, this is known an adversarial example. The ideal definition of an adversarial example is when humans consistently say two pictures are the same but a machine disagrees. Hadi Salman, a Ph.D student at MIT (ex-Uber and Microsoft Research) started thinking about how adversarial robustness could be leveraged beyond security. He realised that the phenomenon of adversarial examples could actually be turned upside down to lead to more robust models instead of breaking them. Hadi actually utilized the brittleness of neural networks to design unadversarial examples or robust objects which_ are objects designed specifically to be robustly recognized by neural networks. Introduction [00:00:00] DR KILCHER'S PHD HAT [00:11:18] Main Introduction [00:11:38] Hadi's Introduction [00:14:43] More robust models == transfer better [00:46:41] Features not bugs paper [00:49:13] Manifolds [00:55:51] Robustness and Transferability [00:58:00] Do non-robust features generalize worse than robust? [00:59:52] The unreasonable predicament of entangled features [01:01:57] We can only find adversarial examples in the vicinity [01:09:30] Certifiability of models for robustness [01:13:55] Carlini is coming for you! And we are screwed [01:23:21] Distribution shift and corruptions are a bigger problem than adversarial examples [01:25:34] All roads lead to generalization [01:26:47] Unadversarial examples [01:27:26] | |||
13 Nov 2024 | Why Your GPUs are underutilised for AI - CentML CEO Explains | 02:08:40 | |
Prof. Gennady Pekhimenko (CEO of CentML, UofT) joins us in this *sponsored episode* to dive deep into AI system optimization and enterprise implementation. From NVIDIA's technical leadership model to the rise of open-source AI, Pekhimenko shares insights on bridging the gap between academic research and industrial applications. Learn about "dark silicon," GPU utilization challenges in ML workloads, and how modern enterprises can optimize their AI infrastructure. The conversation explores why some companies achieve only 10% GPU efficiency and practical solutions for improving AI system performance. A must-watch for anyone interested in the technical foundations of enterprise AI and hardware optimization. CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Cheaper, faster, no commitments, pay as you go, scale massively, simple to setup. Check it out! https://centml.ai/pricing/ SPONSOR MESSAGES: MLST is also sponsored by Tufa AI Labs - https://tufalabs.ai/ They are hiring cracked ML engineers/researchers to work on ARC and build AGI! SHOWNOTES (diarised transcript, TOC, references, summary, best quotes etc) https://www.dropbox.com/scl/fi/w9kbpso7fawtm286kkp6j/Gennady.pdf?rlkey=aqjqmncx3kjnatk2il1gbgknk&st=2a9mccj8&dl=0 TOC: 1. AI Strategy and Leadership [00:00:00] 1.1 Technical Leadership and Corporate Structure [00:09:55] 1.2 Open Source vs Proprietary AI Models [00:16:04] 1.3 Hardware and System Architecture Challenges [00:23:37] 1.4 Enterprise AI Implementation and Optimization [00:35:30] 1.5 AI Reasoning Capabilities and Limitations 2. AI System Development [00:38:45] 2.1 Computational and Cognitive Limitations of AI Systems [00:42:40] 2.2 Human-LLM Communication Adaptation and Patterns [00:46:18] 2.3 AI-Assisted Software Development Challenges [00:47:55] 2.4 Future of Software Engineering Careers in AI Era [00:49:49] 2.5 Enterprise AI Adoption Challenges and Implementation 3. ML Infrastructure Optimization [00:54:41] 3.1 MLOps Evolution and Platform Centralization [00:55:43] 3.2 Hardware Optimization and Performance Constraints [01:05:24] 3.3 ML Compiler Optimization and Python Performance [01:15:57] 3.4 Enterprise ML Deployment and Cloud Provider Partnerships 4. Distributed AI Architecture [01:27:05] 4.1 Multi-Cloud ML Infrastructure and Optimization [01:29:45] 4.2 AI Agent Systems and Production Readiness [01:32:00] 4.3 RAG Implementation and Fine-Tuning Considerations [01:33:45] 4.4 Distributed AI Systems Architecture and Ray Framework 5. AI Industry Standards and Research [01:37:55] 5.1 Origins and Evolution of MLPerf Benchmarking [01:43:15] 5.2 MLPerf Methodology and Industry Impact [01:50:17] 5.3 Academic Research vs Industry Implementation in AI [01:58:59] 5.4 AI Research History and Safety Concerns | |||
01 Dec 2024 | Jonas Hübotter (ETH) - Test Time Inference | 01:45:56 | |
Jonas Hübotter, PhD student at ETH Zurich's Institute for Machine Learning, discusses his groundbreaking research on test-time computation and local learning. He demonstrates how smaller models can outperform larger ones by 30x through strategic test-time computation and introduces a novel paradigm combining inductive and transductive learning approaches. Using Bayesian linear regression as a surrogate model for uncertainty estimation, Jonas explains how models can efficiently adapt to specific tasks without massive pre-training. He draws an analogy to Google Earth's variable resolution system to illustrate dynamic resource allocation based on task complexity. The conversation explores the future of AI architecture, envisioning systems that continuously learn and adapt beyond current monolithic models. Jonas concludes by proposing hybrid deployment strategies combining local and cloud computation, suggesting a future where compute resources are allocated based on task complexity rather than fixed model size. This research represents a significant shift in machine learning, prioritizing intelligent resource allocation and adaptive learning over traditional scaling approaches. SPONSOR MESSAGES: CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on ARC and AGI, they just acquired MindsAI - the current winners of the ARC challenge. Are you interested in working on ARC, or getting involved in their events? Goto https://tufalabs.ai/ Transcription, references and show notes PDF download: https://www.dropbox.com/scl/fi/cxg80p388snwt6qbp4m52/JonasFinal.pdf?rlkey=glk9mhpzjvesanlc14rtpvk4r&st=6qwi8n3x&dl=0 Jonas Hübotter https://jonhue.github.io/ https://scholar.google.com/citations?user=pxi_RkwAAAAJ Transductive Active Learning: Theory and Applications (NeurIPS 2024) https://arxiv.org/pdf/2402.15898 EFFICIENTLY LEARNING AT TEST-TIME: ACTIVE FINE-TUNING OF LLMS (SIFT) https://arxiv.org/pdf/2410.08020 TOC: 1. Test-Time Computation Fundamentals [00:00:00] Intro [00:03:10] 1.1 Test-Time Computation and Model Performance Comparison [00:05:52] 1.2 Retrieval Augmentation and Machine Teaching Strategies [00:09:40] 1.3 In-Context Learning vs Fine-Tuning Trade-offs 2. System Architecture and Intelligence [00:15:58] 2.1 System Architecture and Intelligence Emergence [00:23:22] 2.2 Active Inference and Constrained Agency in AI [00:29:52] 2.3 Evolution of Local Learning Methods [00:32:05] 2.4 Vapnik's Contributions to Transductive Learning 3. Resource Optimization and Local Learning [00:34:35] 3.1 Computational Resource Allocation in ML Models [00:35:30] 3.2 Historical Context and Traditional ML Optimization [00:37:55] 3.3 Variable Resolution Processing and Active Inference in ML [00:43:01] 3.4 Local Learning and Base Model Capacity Trade-offs [00:48:04] 3.5 Active Learning vs Local Learning Approaches 4. Information Retrieval and Model Interpretability [00:51:08] 4.1 Information Retrieval and Nearest Neighbor Limitations [01:03:07] 4.2 Model Interpretability and Surrogate Models [01:15:03] 4.3 Bayesian Uncertainty Estimation and Surrogate Models 5. Distributed Systems and Deployment [01:23:56] 5.1 Memory Architecture and Controller Systems [01:28:14] 5.2 Evolution from Static to Distributed Learning Systems [01:38:03] 5.3 Transductive Learning and Model Specialization [01:41:58] 5.4 Hybrid Local-Cloud Deployment Strategies | |||
10 Sep 2023 | Prof. Melanie Mitchell 2.0 - AI Benchmarks are Broken! | 01:01:47 | |
Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB Prof. Melanie Mitchell argues that the concept of "understanding" in AI is ill-defined and multidimensional - we can't simply say an AI system does or doesn't understand. She advocates for rigorously testing AI systems' capabilities using proper experimental methods from cognitive science. Popular benchmarks for intelligence often rely on the assumption that if a human can perform a task, an AI that performs the task must have human-like general intelligence. But benchmarks should evolve as capabilities improve. Large language models show surprising skill on many human tasks but lack common sense and fail at simple things young children can do. Their knowledge comes from statistical relationships in text, not grounded concepts about the world. We don't know if their internal representations actually align with human-like concepts. More granular testing focused on generalization is needed. There are open questions around whether large models' abilities constitute a fundamentally different non-human form of intelligence based on vast statistical correlations across text. Mitchell argues intelligence is situated, domain-specific and grounded in physical experience and evolution. The brain computes but in a specialized way honed by evolution for controlling the body. Extracting "pure" intelligence may not work. Other key points: - Need more focus on proper experimental method in AI research. Developmental psychology offers examples for rigorous testing of cognition. - Reporting instance-level failures rather than just aggregate accuracy can provide insights. - Scaling laws and complex systems science are an interesting area of complexity theory, with applications to understanding cities. - Concepts like "understanding" and "intelligence" in AI force refinement of fuzzy definitions. - Human intelligence may be more collective and social than we realize. AI forces us to rethink concepts we apply anthropomorphically. The overall emphasis is on rigorously building the science of machine cognition through proper experimentation and benchmarking as we assess emerging capabilities. TOC: [00:00:00] Introduction and Munk AI Risk Debate Highlights [05:00:00] Douglas Hofstadter on AI Risk [00:06:56] The Complexity of Defining Intelligence [00:11:20] Examining Understanding in AI Models [00:16:48] Melanie's Insights on AI Understanding Debate [00:22:23] Unveiling the Concept Arc [00:27:57] AI Goals: A Human vs Machine Perspective [00:31:10] Addressing the Extrapolation Challenge in AI [00:36:05] Brain Computation: The Human-AI Parallel [00:38:20] The Arc Challenge: Implications and Insights [00:43:20] The Need for Detailed AI Performance Reporting [00:44:31] Exploring Scaling in Complexity Theory Eratta: Note Tim said around 39 mins that a recent Stanford/DM paper modelling ARC “on GPT-4 got around 60%”. This is not correct and he misremembered. It was actually davinci3, and around 10%, which is still extremely good for a blank slate approach with an LLM and no ARC specific knowledge. Folks on our forum couldn’t reproduce the result. See paper linked below. Books (MUST READ): Artificial Intelligence: A Guide for Thinking Humans (Melanie Mitchell) https://www.amazon.co.uk/Artificial-Intelligence-Guide-Thinking-Humans/dp/B07YBHNM1C/?&_encoding=UTF8&tag=mlst00-21&linkCode=ur2&linkId=44ccac78973f47e59d745e94967c0f30&camp=1634&creative=6738 Complexity: A Guided Tour (Melanie Mitchell) https://www.amazon.co.uk/Audible-Complexity-A-Guided-Tour?&_encoding=UTF8&tag=mlst00-21&linkCode=ur2&linkId=3f8bd505d86865c50c02dd7f10b27c05&camp=1634&creative=6738 Show notes (transcript, full references etc) https://atlantic-papyrus-d68.notion.site/Melanie-Mitchell-2-0-15e212560e8e445d8b0131712bad3000?pvs=25 YT version: https://youtu.be/29gkDpR2orc | |||
16 Jan 2025 | Jurgen Schmidhuber on Humans co-existing with AIs | 01:12:50 | |
Jürgen Schmidhuber, the father of generative AI, challenges current AI narratives, revealing that early deep learning work is in his opinion misattributed, where it actually originated in Ukraine and Japan. He discusses his early work on linear transformers and artificial curiosity which preceded modern developments, shares his expansive vision of AI colonising space, and explains his groundbreaking 1991 consciousness model. Schmidhuber dismisses fears of human-AI conflict, arguing that superintelligent AI scientists will be fascinated by their own origins and motivated to protect life rather than harm it, while being more interested in other superintelligent AI and in cosmic expansion than earthly matters. He offers unique insights into how humans and AI might coexist. This was the long-awaited second, unreleased part of our interview we filmed last time. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** Interviewer: Tim Scarfe TOC [00:00:00] The Nature and Motivations of AI [00:02:08] Influential Inventions: 20th vs. 21st Century [00:05:28] Transformer and GPT: A Reflection The revolutionary impact of modern language models, the 1991 linear transformer, linear vs. quadratic scaling, the fast weight controller, and fast weight matrix memory. [00:11:03] Pioneering Contributions to AI and Deep Learning The invention of the transformer, pre-trained networks, the first GANs, the role of predictive coding, and the emergence of artificial curiosity. [00:13:58] AI's Evolution and Achievements The role of compute, breakthroughs in handwriting recognition and computer vision, the rise of GPU-based CNNs, achieving superhuman results, and Japanese contributions to CNN development. [00:15:40] The Hardware Lottery and GPUs GPUs as a serendipitous advantage for AI, the gaming-AI parallel, and Nvidia's strategic shift towards AI. [00:19:58] AI Applications and Societal Impact AI-powered translation breaking communication barriers, AI in medicine for imaging and disease prediction, and AI's potential for human enhancement and sustainable development. [00:23:26] The Path to AGI and Current Limitations Distinguishing large language models from AGI, challenges in replacing physical world workers, and AI's difficulty in real-world versus board games. [00:25:56] AI and Consciousness Simulating consciousness through unsupervised learning, chunking and automatizing neural networks, data compression, and self-symbols in predictive world models. [00:30:50] The Future of AI and Humanity Transition from AGIs as tools to AGIs with their own goals, the role of humans in an AGI-dominated world, and the concept of Homo Ludens. [00:38:05] The AI Race: Europe, China, and the US Europe's historical contributions, current dominance of the US and East Asia, and the role of venture capital and industrial policy. [00:50:32] Addressing AI Existential Risk The obsession with AI existential risk, commercial pressure for friendly AIs, AI vs. hydrogen bombs, and the long-term future of AI. [00:58:00] The Fermi Paradox and Extraterrestrial Intelligence Expanding AI bubbles as an explanation for the Fermi paradox, dark matter and encrypted civilizations, and Earth as the first to spawn an AI bubble. [01:02:08] The Diversity of AI and AI Ecologies The unrealism of a monolithic super intelligence, diverse AIs with varying goals, and intense competition and collaboration in AI ecologies. [01:12:21] Final Thoughts and Closing Remarks REFERENCES: See pinned comment on YT: https://youtu.be/fZYUqICYCAk | |||
14 Mar 2021 | 047 Interpretable Machine Learning - Christoph Molnar | 01:40:12 | |
Christoph Molnar is one of the main people to know in the space of interpretable ML. In 2018 he released the first version of his incredible online book, interpretable machine learning. Interpretability is often a deciding factor when a machine learning (ML) model is used in a product, a decision process, or in research. Interpretability methods can be used to discover knowledge, to debug or justify the model and its predictions, and to control and improve the model, reason about potential bias in models as well as increase the social acceptance of models. But Interpretability methods can also be quite esoteric, add an additional layer of complexity and potential pitfalls and requires expert knowledge to understand. Is it even possible to understand complex models or even humans for that matter in any meaningful way? Introduction to IML [00:00:00] Show Kickoff [00:13:28] What makes a good explanation? [00:15:51] Quantification of how good an explanation is [00:19:59] Knowledge of the pitfalls of IML [00:22:14] Are linear models even interpretable? [00:24:26] Complex Math models to explain Complex Math models? [00:27:04] Saliency maps are glorified edge detectors [00:28:35] Challenge on IML -- feature dependence [00:36:46] Don't leap to using a complex model! Surrogate models can be too dumb [00:40:52] On airplane pilots. Seeking to understand vs testing [00:44:09] IML Could help us make better models or lead a better life [00:51:53] Lack of statistical rigor and quantification of uncertainty [00:55:35] On Causality [01:01:09] Broadening out the discussion to the process or institutional level [01:08:53] No focus on fairness / ethics? [01:11:44] Is it possible to condition ML model training on IML metrics ? [01:15:27] Where is IML going? Some of the esoterica of the IML methods [01:18:35] You can't compress information without common knowledge, the latter becomes the bottleneck [01:23:25] IML methods used non-interactively? Making IML an engineering discipline [01:31:10] Tim Postscript -- on the lack of effective corporate operating models for IML, security, engineering and ethics [01:36:34] Explanation in Artificial Intelligence: Insights from the Social Sciences (Tim Miller 2018) https://arxiv.org/pdf/1706.07269.pdf Seven Myths in Machine Learning Research (Chang 19) Myth 7: Saliency maps are robust ways to interpret neural networks https://arxiv.org/pdf/1902.06789.pdf Sanity Checks for Saliency Maps (Adebayo 2020) https://arxiv.org/pdf/1810.03292.pdf Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/ Christoph Molnar: https://www.linkedin.com/in/christoph-molnar-63777189/ https://machine-master.blogspot.com/ https://twitter.com/ChristophMolnar Please show your appreciation and buy Christoph's book here; https://www.lulu.com/shop/christoph-molnar/interpretable-machine-learning/paperback/product-24449081.html?page=1&pageSize=4 Panel: Connor Tann https://www.linkedin.com/in/connor-tann-a92906a1/ Dr. Tim Scarfe Dr. Keith Duggar Video version: https://youtu.be/0LIACHcxpHU | |||
11 Dec 2022 | #86 - Prof. YANN LECUN and Dr. RANDALL BALESTRIERO - SSL, Data Augmentation, Reward isn't enough [NEURIPS2022] | 00:30:28 | |
Yann LeCun is a French computer scientist known for his pioneering work on convolutional neural networks, optical character recognition and computer vision. He is a Silver Professor at New York University and Vice President, Chief AI Scientist at Meta. Along with Yoshua Bengio and Geoffrey Hinton, he was awarded the 2018 Turing Award for their work on deep learning, earning them the nickname of the "Godfathers of Deep Learning". Dr. Randall Balestriero has been researching learnable signal processing since 2013, with a focus on learnable parametrized wavelets and deep wavelet transforms. His research has been used by NASA, leading to applications such as Marsquake detection. During his PhD at Rice University, Randall explored deep networks from a theoretical perspective and improved state-of-the-art methods such as batch-normalization and generative networks. Later, when joining Meta AI Research (FAIR) as a postdoc with Prof. Yann LeCun, Randall further broadened his research interests to include self-supervised learning and the biases emerging from data-augmentation and regularization, resulting in numerous publications. Episode recorded live at NeurIPS. YT: https://youtu.be/9dLd6n9yT8U (references are there) Support us! https://www.patreon.com/mlst Host: Dr. Tim Scarfe TOC: [00:00:00] LeCun interview [00:18:25] Randall Balestriero interview (mostly on spectral SSL paper, first ref) | |||
16 Dec 2022 | #88 Dr. WALID SABA - Why machines will never rule the world [UNPLUGGED] | 01:21:59 | |
Support us! https://www.patreon.com/mlst Dr. Walid Saba recently reviewed the book Machines Will Never Rule The World, which argues that strong AI is impossible. He acknowledges the complexity of modeling mental processes and language, as well as interactive dialogues, and questions the authors' use of "never." Despite his skepticism, he is impressed with recent developments in large language models, though he questions the extent of their success. We then discussed the successes of cognitive science. Walid believes that something has been achieved which many cognitive scientists would never accept, namely the ability to learn from data empirically. Keith agrees that this is a huge step, but notes that there is still much work to be done to get to the "other 5%" of accuracy. They both agree that the current models are too brittle and require much more data and parameters to get to the desired level of accuracy. Walid then expresses admiration for deep learning systems' ability to learn non-trivial aspects of language from ingesting text only. He argues that this is an "existential proof" of language competency and that it would be impossible for a group of luminaries such as Montague, Marvin Minsky, John McCarthy, and a thousand other bright engineers to replicate the same level of competency as we have now with LLMs. He then discusses the problem of semantics and pragmatics, as well as symbol grounding, and expresses skepticism about grounded meaning and embodiment. He believes that artificial intelligence should be used to solve real-world problems which require human intelligence but not believe that robots should be built to understand love or other subjective feelings. We discussed the unique properties of natural human language. Walid believes that the core unique property is the ability to do abductive reasoning, which is the process of reasoning to the best explanation or understanding. Keith adds that there are two types of abduction - one for generating hypotheses and one for justifying them. In both cases, abductive reasoning involves choosing from a set of plausible possibilities. Finally, we discussed the book "Machines Will Never Rule The World" and its argument that the current mathematics and technology is not enough to model complex systems. Walid agrees with the book's argument but is still optimistic that a new mathematics can be discovered. Keith suggests the possibility of an AGI discovering the mathematics to create itself. They also discussed how the book could serve as a reminder to temper the hype surrounding AI and to focus on exploration, creativity, and daring ideas. Walid ended by stressing the importance of science, noting that engineers should play within the Venn diagrams drawn by scientists, rather than trying to hack their way through it. Transcript: https://share.descript.com/view/BFQb5iaegJC Discord: https://discord.gg/aNPkGUQtc5 YT: https://youtu.be/IMnWAuoucjo TOC: [00:00:00] Intro [00:06:52] Walid's change of heart on DL/LLMs and on the skeptics like Gary Marcus [00:22:52] Symbol Grounding [00:32:26] On Montague [00:40:41] On Abduction [00:50:54] Language of thought [00:56:08] Why machines will never rule the world book review [01:20:06] Engineers should play in the scientists Venn Diagram! Panel; Dr. Tim Scarfe Dr. Keith Duggar Mark Mcguill | |||
23 Mar 2021 | #49 - Meta-Gradients in RL - Dr. Tom Zahavy (DeepMind) | 01:25:13 | |
The race is on, we are on a collective mission to understand and create artificial general intelligence. Dr. Tom Zahavy, a Research Scientist at DeepMind thinks that reinforcement learning is the most general learning framework that we have today, and in his opinion it could lead to artificial general intelligence. He thinks there are no tasks which could not be solved by simply maximising a reward. Back in 2012 when Tom was an undergraduate, before the deep learning revolution he attended an online lecture on how CNNs automatically discover representations. This was an epiphany for Tom. He decided in that very moment that he was going to become an ML researcher. Tom's view is that the ability to recognise patterns and discover structure is the most important aspect of intelligence. This has been his quest ever since. He is particularly focused on using diversity preservation and metagradients to discover this structure. In this discussion we dive deep into meta gradients in reinforcement learning. Video version and TOC @ https://www.youtube.com/watch?v=hfaZwgk_iS0 | |||
12 Feb 2025 | Sepp Hochreiter - LSTM: The Comeback Story? | 01:07:01 | |
Sepp Hochreiter, the inventor of LSTM (Long Short-Term Memory) networks – a foundational technology in AI. Sepp discusses his journey, the origins of LSTM, and why he believes his latest work, XLSTM, could be the next big thing in AI, particularly for applications like robotics and industrial simulation. He also shares his controversial perspective on Large Language Models (LLMs) and why reasoning is a critical missing piece in current AI systems. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting! https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT AND BACKGROUND READING: https://www.dropbox.com/scl/fi/n1vzm79t3uuss8xyinxzo/SEPPH.pdf?rlkey=fp7gwaopjk17uyvgjxekxrh5v&dl=0 Prof. Sepp Hochreiter https://www.nx-ai.com/ https://x.com/hochreitersepp https://scholar.google.at/citations?user=tvUH3WMAAAAJ&hl=en TOC: 1. LLM Evolution and Reasoning Capabilities [00:00:00] 1.1 LLM Capabilities and Limitations Debate [00:03:16] 1.2 Program Generation and Reasoning in AI Systems [00:06:30] 1.3 Human vs AI Reasoning Comparison [00:09:59] 1.4 New Research Initiatives and Hybrid Approaches 2. LSTM Technical Architecture [00:13:18] 2.1 LSTM Development History and Technical Background [00:20:38] 2.2 LSTM vs RNN Architecture and Computational Complexity [00:25:10] 2.3 xLSTM Architecture and Flash Attention Comparison [00:30:51] 2.4 Evolution of Gating Mechanisms from Sigmoid to Exponential 3. Industrial Applications and Neuro-Symbolic AI [00:40:35] 3.1 Industrial Applications and Fixed Memory Advantages [00:42:31] 3.2 Neuro-Symbolic Integration and Pi AI Project [00:46:00] 3.3 Integration of Symbolic and Neural AI Approaches [00:51:29] 3.4 Evolution of AI Paradigms and System Thinking [00:54:55] 3.5 AI Reasoning and Human Intelligence Comparison [00:58:12] 3.6 NXAI Company and Industrial AI Applications REFS: [00:00:15] Seminal LSTM paper establishing Hochreiter's expertise (Hochreiter & Schmidhuber) https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory [00:04:20] Kolmogorov complexity and program composition limitations (Kolmogorov) https://link.springer.com/article/10.1007/BF02478259 [00:07:10] Limitations of LLM mathematical reasoning and symbolic integration (Various Authors) https://www.arxiv.org/pdf/2502.03671 [00:09:05] AlphaGo’s Move 37 demonstrating creative AI (Google DeepMind) https://deepmind.google/research/breakthroughs/alphago/ [00:10:15] New AI research lab in Zurich for fundamental LLM research (Benjamin Crouzier) https://tufalabs.ai [00:19:40] Introduction of xLSTM with exponential gating (Beck, Hochreiter, et al.) https://arxiv.org/abs/2405.04517 [00:22:55] FlashAttention: fast & memory-efficient attention (Tri Dao et al.) https://arxiv.org/abs/2205.14135 [00:31:00] Historical use of sigmoid/tanh activation in 1990s (James A. McCaffrey) https://visualstudiomagazine.com/articles/2015/06/01/alternative-activation-functions.aspx [00:36:10] Mamba 2 state space model architecture (Albert Gu et al.) https://arxiv.org/abs/2312.00752 [00:46:00] Austria’s Pi AI project integrating symbolic & neural AI (Hochreiter et al.) https://www.jku.at/en/institute-of-machine-learning/research/projects/ [00:48:10] Neuro-symbolic integration challenges in language models (Diego Calanzone et al.) https://openreview.net/forum?id=7PGluppo4k [00:49:30] JKU Linz’s historical and neuro-symbolic research (Sepp Hochreiter) https://www.jku.at/en/news-events/news/detail/news/bilaterale-ki-projekt-unter-leitung-der-jku-erhaelt-fwf-cluster-of-excellence/ YT: https://www.youtube.com/watch?v=8u2pW2zZLCs <truncated, see show notes/YT> | |||
13 Sep 2024 | Ashley Edwards - Genie Paper (DeepMind/Runway) | 00:25:04 | |
Ashley Edwards, who was working at DeepMind when she co-authored the Genie paper and is now at Runway, covered several key aspects of the Genie AI system and its applications in video generation, robotics, and game creation. MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api. Genie's approach to learning interactive environments, balancing compression and fidelity. The use of latent action models and VQE models for video processing and tokenization. Challenges in maintaining action consistency across frames and integrating text-to-image models. Evaluation metrics for AI-generated content, such as FID and PS&R diff metrics. The discussion also explored broader implications and applications: The potential impact of AI video generation on content creation jobs. Applications of Genie in game generation and robotics. The use of foundation models in robotics and the differences between internet video data and specialized robotics data. Challenges in mapping AI-generated actions to real-world robotic actions. Ashley Edwards: https://ashedwards.github.io/ TOC (*) are best bits 00:00:00 1. Intro to Genie & Brave Search API: Trade-offs & limitations * 00:02:26 2. Genie's Architecture: Latent action, VQE, video processing * 00:05:06 3. Genie's Constraints: Frame consistency & image model integration 00:07:26 4. Evaluation: FID, PS&R diff metrics & latent induction methods 00:09:44 5. AI Video Gen: Content creation impact, depth & parallax effects 00:11:39 6. Model Scaling: Training data impact & computational trade-offs 00:13:50 7. Game & Robotics Apps: Gamification & action mapping challenges * 00:16:16 8. Robotics Foundation Models: Action space & data considerations * 00:19:18 9. Mask-GPT & Video Frames: Real-time optimization, RL from videos 00:20:34 10. Research Challenges: AI value, efficiency vs. quality, safety 00:24:20 11. Future Dev: Efficiency improvements & fine-tuning strategies Refs: 1. Genie (learning interactive environments from videos) / Ashley and DM collegues [00:01] https://arxiv.org/abs/2402.15391 2. VQ-VAE (Vector Quantized Variational Autoencoder) / Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu [02:43] https://arxiv.org/abs/1711.00937 3. FID (Fréchet Inception Distance) metric / Martin Heusel et al. [07:37] https://arxiv.org/abs/1706.08500 4. PS&R (Precision and Recall) metric / Mehdi S. M. Sajjadi et al. [08:02] https://arxiv.org/abs/1806.00035 5. Vision Transformer (ViT) architecture / Alexey Dosovitskiy et al. [12:14] https://arxiv.org/abs/2010.11929 6. Genie (robotics foundation models) / Google DeepMind [17:34] https://deepmind.google/research/publications/60474/ 7. Chelsea Finn's lab work on robotics datasets / Chelsea Finn [17:38] https://ai.stanford.edu/~cbfinn/ 8. Imitation from observation in reinforcement learning / YuXuan Liu [20:58] https://arxiv.org/abs/1707.03374 9. Waymo's autonomous driving technology / Waymo [22:38] https://waymo.com/ 10. Gen3 model release by Runway / Runway [23:48] https://runwayml.com/ 11. Classifier-free guidance technique / Jonathan Ho and Tim Salimans [24:43] https://arxiv.org/abs/2207.12598 | |||
21 Aug 2024 | Joscha Bach - AGI24 Keynote (Cyberanimism) | 00:57:21 | |
Dr. Joscha Bach introduces a surprising idea called "cyber animism" in his AGI-24 talk - the notion that nature might be full of self-organizing software agents, similar to the spirits in ancient belief systems. Bach suggests that consciousness could be a kind of software running on our brains, and wonders if similar "programs" might exist in plants or even entire ecosystems. MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api. Joscha takes us on a tour de force through history, philosophy, and cutting-edge computer science, teasing us to rethink what we know about minds, machines, and the world around us. Joscha believes we should blur the lines between human, artificial, and natural intelligence, and argues that consciousness might be more widespread and interconnected than we ever thought possible. Dr. Joscha Bach https://x.com/Plinz This is video 2/9 from our coverage of AGI-24 in Seattle https://agi-conf.org/2024/ Watch the official MLST interview with Joscha which we did right after this talk on our Patreon now on early access - https://www.patreon.com/posts/joscha-bach-110199676 (you also get access to our private discord and biweekly calls) TOC: 00:00:00 Introduction: AGI and Cyberanimism 00:03:57 The Nature of Consciousness 00:08:46 Aristotle's Concepts of Mind and Consciousness 00:13:23 The Hard Problem of Consciousness 00:16:17 Functional Definition of Consciousness 00:20:24 Comparing LLMs and Human Consciousness 00:26:52 Testing for Consciousness in AI Systems 00:30:00 Animism and Software Agents in Nature 00:37:02 Plant Consciousness and Ecosystem Intelligence 00:40:36 The California Institute for Machine Consciousness 00:44:52 Ethics of Conscious AI and Suffering 00:46:29 Philosophical Perspectives on Consciousness 00:49:55 Q&A: Formalisms for Conscious Systems 00:53:27 Coherence, Self-Organization, and Compute Resources YT version (very high quality, filmed by us live) https://youtu.be/34VOI_oo-qM Refs: Aristotle's work on the soul and consciousness Richard Dawkins' work on genes and evolution Gerald Edelman's concept of Neural Darwinism Thomas Metzinger's book "Being No One" Yoshua Bengio's concept of the "consciousness prior" Stuart Hameroff's theories on microtubules and consciousness Christof Koch's work on consciousness Daniel Dennett's "Cartesian Theater" concept Giulio Tononi's Integrated Information Theory Mike Levin's work on organismal intelligence The concept of animism in various cultures Freud's model of the mind Buddhist perspectives on consciousness and meditation The Genesis creation narrative (for its metaphorical interpretation) California Institute for Machine Consciousness | |||
13 Feb 2024 | Dr. Brandon Rohrer - Robotics, Creativity and Intelligence | 01:31:42 | |
Brandon Rohrer who obtained his Ph.D from MIT is driven by understanding algorithms ALL the way down to their nuts and bolts, so he can make them accessible to everyone by first explaining them in the way HE himself would have wanted to learn! Please support us on Patreon for loads of exclusive content and private Discord: https://patreon.com/mlst (public discord) https://discord.gg/aNPkGUQtc5 https://twitter.com/MLStreetTalk Brandon Rohrer is a seasoned data science leader and educator with a rich background in creating robust, efficient machine learning algorithms and tools. With a Ph.D. in Mechanical Engineering from MIT, his expertise encompasses a broad spectrum of AI applications — from computer vision and natural language processing to reinforcement learning and robotics. Brandon's career has seen him in Principle-level roles at Microsoft and Facebook. An educator at heart, he also shares his knowledge through detailed tutorials, courses, and his forthcoming book, "How to Train Your Robot." YT version: https://www.youtube.com/watch?v=4Ps7ahonRCY Brandon's links: https://github.com/brohrer https://www.youtube.com/channel/UCsBKTrp45lTfHa_p49I2AEQ https://www.linkedin.com/in/brohrer/ How transformers work: https://e2eml.school/transformers Brandon's End-to-End Machine Learning school courses, posts, and tutorials https://e2eml.school Free course: https://end-to-end-machine-learning.teachable.com/p/complete-course-library-full-end-to-end-machine-learning-catalog Blog: https://e2eml.school/blog.html Ziptie: Learning Useful Features [Brandon Rohrer] https://www.brandonrohrer.com/ziptie TOC should be baked into the MP3 file now 00:00:00 - Intro to Brandon 00:00:36 - RLHF 00:01:09 - Limitations of transformers 00:07:23 - Agency - we are all GPTs 00:09:07 - BPE / representation bias 00:12:00 - LLM true believers 00:16:42 - Brandon's style of teaching 00:19:50 - ML vs real world = Robotics 00:29:59 - Reward shaping 00:37:08 - No true Scotsman - when do we accept capabilities as real 00:38:50 - Externalism 00:43:03 - Building flexible robots 00:45:37 - Is reward enough 00:54:30 - Optimization curse 00:58:15 - Collective intelligence 01:01:51 - Intelligence + creativity 01:13:35 - ChatGPT + Creativity 01:25:19 - Transformers Tutorial | |||
20 Nov 2022 | #81 JULIAN TOGELIUS, Prof. KEN STANLEY - AGI, Games, Diversity & Creativity [UNPLUGGED] | 01:09:46 | |
Support us (and please rate on podcast app) https://www.patreon.com/mlst In this show tonight with Prof. Julian Togelius (NYU) and Prof. Ken Stanley we discuss open-endedness, AGI, game AI and reinforcement learning. [Prof Julian Togelius] https://engineering.nyu.edu/faculty/julian-togelius https://twitter.com/togelius [Prof Ken Stanley] https://www.cs.ucf.edu/~kstanley/ https://twitter.com/kenneth0stanley TOC: [00:00:00] Introduction [00:01:07] AI and computer games [00:12:23] Intelligence [00:21:27] Intelligence Explosion [00:25:37] What should we be aspiring towards? [00:29:14] Should AI contribute to culture? [00:32:12] On creativity and open-endedness [00:36:11] RL overfitting [00:44:02] Diversity preservation [00:51:18] Empiricism vs rationalism , in gradient descent the data pushes you around [00:55:49] Creativity and interestingness (does complexity / information increase) [01:03:20] What does a population give us? [01:05:58] Emergence / generalisation snobbery References; [Hutter/Legg] Universal Intelligence: A Definition of Machine Intelligence https://arxiv.org/abs/0712.3329 https://en.wikipedia.org/wiki/Artificial_general_intelligence https://en.wikipedia.org/wiki/I._J._Good https://en.wikipedia.org/wiki/G%C3%B6del_machine [Chollet] Impossibility of intelligence explosion https://medium.com/@francois.chollet/the-impossibility-of-intelligence-explosion-5be4a9eda6ec [Alex Irpan] - RL is hard https://www.alexirpan.com/2018/02/14/rl-hard.html https://nethackchallenge.com/ Map elites https://arxiv.org/abs/1504.04909 Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space https://arxiv.org/abs/1912.02400 [Stanley] - Why greatness cannot be planned https://www.amazon.com/Why-Greatness-Cannot-Planned-Objective/dp/3319155237 [Lehman/Stanley] Abandoning Objectives: Evolution through the Search for Novelty Alone https://www.cs.swarthmore.edu/~meeden/DevelopmentalRobotics/lehman_ecj11.pdf | |||
08 Jul 2024 | David Chalmers - Reality+ | 01:17:57 | |
In the coming decades, the technology that enables virtual and augmented reality will improve beyond recognition. Within a century, world-renowned philosopher David J. Chalmers predicts, we will have virtual worlds that are impossible to distinguish from non-virtual worlds. But is virtual reality just escapism? In a highly original work of 'technophilosophy', Chalmers argues categorically, no: virtual reality is genuine reality. Virtual worlds are not second-class worlds. We can live a meaningful life in virtual reality - and increasingly, we will. What is reality, anyway? How can we lead a good life? Is there a god? How do we know there's an external world - and how do we know we're not living in a computer simulation? In Reality+, Chalmers conducts a grand tour of philosophy, using cutting-edge technology to provide invigorating new answers to age-old questions. David J. Chalmers is an Australian philosopher and cognitive scientist specializing in the areas of philosophy of mind and philosophy of language. He is Professor of Philosophy and Neural Science at New York University, as well as co-director of NYU's Center for Mind, Brain, and Consciousness. Chalmers is best known for his work on consciousness, including his formulation of the "hard problem of consciousness." Reality+: Virtual Worlds and the Problems of Philosophy https://amzn.to/3RYyGD2 https://consc.net/ https://x.com/davidchalmers42 00:00:00 Reality+ Intro 00:12:02 GPT conscious? 10/10 00:14:19 The consciousness processor thought experiment (11/10) 00:20:34 Intelligence and Consciousness entangled? 10/10 00:22:44 Karl Friston / Meta Problem 10/10 00:29:05 Knowledge argument / subjective experience (6/10) 00:32:34 Emergence 11/10 (best chapter) 00:42:45 Working with Douglas Hofstadter 10/10 00:46:14 Intelligence is analogy making? 10/10 00:50:47 Intelligence explosion 8/10 00:58:44 Hypercomputation 10/10 01:09:44 Who designed the designer? (7/10) 01:13:57 Experience machine (7/10) | |||
09 Jan 2025 | Francois Chollet - ARC reflections - NeurIPS 2024 | 01:26:46 | |
François Chollet discusses the outcomes of the ARC-AGI (Abstraction and Reasoning Corpus) Prize competition in 2024, where accuracy rose from 33% to 55.5% on a private evaluation set. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? They are hosting an event in Zurich on January 9th with the ARChitects, join if you can. Goto https://tufalabs.ai/ *** Read about the recent result on o3 with ARC here (Chollet knew about it at the time of the interview but wasn't allowed to say): https://arcprize.org/blog/oai-o3-pub-breakthrough TOC: 1. Introduction and Opening [00:00:00] 1.1 Deep Learning vs. Symbolic Reasoning: François’s Long-Standing Hybrid View [00:00:48] 1.2 “Why Do They Call You a Symbolist?” – Addressing Misconceptions [00:01:31] 1.3 Defining Reasoning 3. ARC Competition 2024 Results and Evolution [00:07:26] 3.1 ARC Prize 2024: Reflecting on the Narrative Shift Toward System 2 [00:10:29] 3.2 Comparing Private Leaderboard vs. Public Leaderboard Solutions [00:13:17] 3.3 Two Winning Approaches: Deep Learning–Guided Program Synthesis and Test-Time Training 4. Transduction vs. Induction in ARC [00:16:04] 4.1 Test-Time Training, Overfitting Concerns, and Developer-Aware Generalization [00:19:35] 4.2 Gradient Descent Adaptation vs. Discrete Program Search 5. ARC-2 Development and Future Directions [00:23:51] 5.1 Ensemble Methods, Benchmark Flaws, and the Need for ARC-2 [00:25:35] 5.2 Human-Level Performance Metrics and Private Test Sets [00:29:44] 5.3 Task Diversity, Redundancy Issues, and Expanded Evaluation Methodology 6. Program Synthesis Approaches [00:30:18] 6.1 Induction vs. Transduction [00:32:11] 6.2 Challenges of Writing Algorithms for Perceptual vs. Algorithmic Tasks [00:34:23] 6.3 Combining Induction and Transduction [00:37:05] 6.4 Multi-View Insight and Overfitting Regulation 7. Latent Space and Graph-Based Synthesis [00:38:17] 7.1 Clément Bonnet’s Latent Program Search Approach [00:40:10] 7.2 Decoding to Symbolic Form and Local Discrete Search [00:41:15] 7.3 Graph of Operators vs. Token-by-Token Code Generation [00:45:50] 7.4 Iterative Program Graph Modifications and Reusable Functions 8. Compute Efficiency and Lifelong Learning [00:48:05] 8.1 Symbolic Process for Architecture Generation [00:50:33] 8.2 Logarithmic Relationship of Compute and Accuracy [00:52:20] 8.3 Learning New Building Blocks for Future Tasks 9. AI Reasoning and Future Development [00:53:15] 9.1 Consciousness as a Self-Consistency Mechanism in Iterative Reasoning [00:56:30] 9.2 Reconciling Symbolic and Connectionist Views [01:00:13] 9.3 System 2 Reasoning - Awareness and Consistency [01:03:05] 9.4 Novel Problem Solving, Abstraction, and Reusability 10. Program Synthesis and Research Lab [01:05:53] 10.1 François Leaving Google to Focus on Program Synthesis [01:09:55] 10.2 Democratizing Programming and Natural Language Instruction 11. Frontier Models and O1 Architecture [01:14:38] 11.1 Search-Based Chain of Thought vs. Standard Forward Pass [01:16:55] 11.2 o1’s Natural Language Program Generation and Test-Time Compute Scaling [01:19:35] 11.3 Logarithmic Gains with Deeper Search 12. ARC Evaluation and Human Intelligence [01:22:55] 12.1 LLMs as Guessing Machines and Agent Reliability Issues [01:25:02] 12.2 ARC-2 Human Testing and Correlation with g-Factor [01:26:16] 12.3 Closing Remarks and Future Directions SHOWNOTES PDF: https://www.dropbox.com/scl/fi/ujaai0ewpdnsosc5mc30k/CholletNeurips.pdf?rlkey=s68dp432vefpj2z0dp5wmzqz6&st=hazphyx5&dl=0 | |||
27 Dec 2020 | #035 Christmas Community Edition! | 02:56:03 | |
Welcome to the Christmas special community edition of MLST! We discuss some recent and interesting papers from Pedro Domingos (are NNs kernel machines?), Deepmind (can NNs out-reason symbolic machines?), Anna Rodgers - When BERT Plays The Lottery, All Tickets Are Winning, Prof. Mark Bishop (even causal methods won't deliver understanding), We also cover our favourite bits from the recent Montreal AI event run by Prof. Gary Marcus (including Rich Sutton, Danny Kahneman and Christof Koch). We respond to a reader mail on Capsule networks. Then we do a deep dive into Type Theory and Lambda Calculus with community member Alex Mattick. In the final hour we discuss inductive priors and label information density with another one of our discord community members. Panel: Dr. Tim Scarfe, Yannic Kilcher, Alex Stenlake, Dr. Keith Duggar Enjoy the show and don't forget to subscribe! 00:00:00 Welcome to Christmas Special! 00:00:44 SoTa meme 00:01:30 Happy Christmas! 00:03:11 Paper -- DeepMind - Outperforming neuro-symbolic models with NNs (Ding et al) 00:08:57 What does it mean to understand? 00:17:37 Paper - Prof. Mark Bishop Artificial Intelligence is stupid and causal reasoning wont fix it 00:25:39 Paper -- Pedro Domingos - Every Model Learned by Gradient Descent Is Approximately a Kernel Machine 00:31:07 Paper - Bengio - Inductive Biases for Deep Learning of Higher-Level Cognition 00:32:54 Anna Rodgers - When BERT Plays The Lottery, All Tickets Are Winning 00:37:16 Montreal AI event - Gary Marcus on reasoning 00:40:37 Montreal AI event -- Rich Sutton on universal theory of AI 00:49:45 Montreal AI event -- Danny Kahneman, System 1 vs 2 and Generative Models ala free energy principle 01:02:57 Montreal AI event -- Christof Koch - Neuroscience is hard 01:10:55 Markus Carr -- reader letter on capsule networks 01:13:21 Alex response to Marcus Carr 01:22:06 Type theory segment -- with Alex Mattick from Discord 01:24:45 Type theory segment -- What is Type Theory 01:28:12 Type theory segment -- Difference between functional and OOP languages 01:29:03 Type theory segment -- Lambda calculus 01:30:46 Type theory segment -- Closures 01:35:05 Type theory segment -- Term rewriting (confluency and termination) 01:42:02 MType theory segment -- eta term rewritig system - Lambda Calculus 01:54:44 Type theory segment -- Types / semantics 02:06:26 Type theory segment -- Calculus of constructions 02:09:27 Type theory segment -- Homotopy type theory 02:11:02 Type theory segment -- Deep learning link 02:17:27 Jan from Discord segment -- Chrome MRU skit 02:18:56 Jan from Discord segment -- Inductive priors (with XMaster96/Jan from Discord) 02:37:59 Jan from Discord segment -- Label information density (with XMaster96/Jan from Discord) 02:55:13 Outro | |||
04 Mar 2023 | #105 - Dr. MICHAEL OLIVER [CSO - Numerai] | 01:20:42 | |
Access Numerai here: http://numer.ai/mlst Michael Oliver is the Chief Scientist at Numerai, a hedge fund that crowdsources machine learning models from data scientists. He has a PhD in Computational Neuroscience from UC Berkeley and was a postdoctoral researcher at the Allen Institute for Brain Science before joining Numerai in 2020. He is also the host of Numerai Quant Club, a YouTube series where he discusses Numerai’s research, data and challenges. YT version: https://youtu.be/61s8lLU7sFg TOC: [00:00:00] Introduction to Michael and Numerai [00:02:03] Understanding / new Bing [00:22:47] Quant vs Neuroscience [00:36:43] Role of language in cognition and planning, and subjective... [00:45:47] Boundaries in finance modelling [00:48:00] Numerai [00:57:37] Aggregation systems [01:00:52] Getting started on Numeral [01:03:21] What models are people using [01:04:23] Numerai Problem Setup [01:05:49] Regimes in financial data and quant talk [01:11:18] Esoteric approaches used on Numeral? [01:13:59] Curse of dimensionality [01:16:32] Metrics [01:19:10] Outro References: Growing Neural Cellular Automata (Alexander Mordvintsev) https://distill.pub/2020/growing-ca/ A Thousand Brains: A New Theory of Intelligence (Jeff Hawkins) https://www.amazon.fr/Thousand-Brains-New-Theory-Intelligence/dp/1541675819 Perceptual Neuroscience: The Cerebral Cortex (Vernon B. Mountcastle) https://www.amazon.ca/Perceptual-Neuroscience-Cerebral-Vernon-Mountcastle/dp/0674661885 Numerai Quant Club with Michael Oliver https://www.youtube.com/watch?v=eLIxarbDXuQ&list=PLz3D6SeXhT3tTu8rhZmjwDZpkKi-UPO1F Numerai YT channel https://www.youtube.com/@Numerai/featured Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5 |