Beta
Logo of the podcast Data Science at Home

Data Science at Home (Francesco Gadaleta)

Explore every episode of Data Science at Home

Dive into the complete episode list for Data Science at Home. Each episode is cataloged with detailed descriptions, making it easy to find and explore specific topics. Keep track of all episodes from your favorite podcast and never miss a moment of insightful content.

Rows per page:

1–50 of 279

Pub. DateTitleDuration
19 Jun 2020Rust and machine learning #2 with Luca Palmieri (Ep. 108)00:27:02

In the second episode of Rust and Machine learning I am speaking with Luca Palmieri, who has been spending a large part of his career at the interception of machine learning and data engineering. In addition, Luca contributed to several projects closer to the machine learning community using the Rust programming language. Linfa is an ambitious project that definitely deserves the attention of the data science community (and it's written in Rust, with Python bindings! How cool??!).

 

References
31 Dec 2020Scaling machine learning with clusters and GPUs (Ep. 134)00:30:58

Let's finish this year with an amazing episode about scaling ML with clusters and GPUs. Kind of as a continuation of Episode 112 I have a terrific conversation with Aaron Richter from Saturn Cloud about, well, making ML faster and scaling it to massive infrastructure.

Aaron can be reached on his website https://rikturr.com and Twitter @rikturr

 

Our Sponsor

Saturn Cloud is a data science and machine learning platform for scalable Python analytics. Users can jump into cloud-based Jupyter and Dask to scale Python for big data using the libraries they know and love, while leveraging Docker and Kubernetes so that work is reproducible, shareable, and ready for production.

Try Saturn Cloud for free at https://saturncloud.io 

Twitter: @saturn_cloud

 

 

11 Oct 2021AI in the Enterprise with IBM Global AI Strategist Mara Pometti (Ep. 171)00:35:02

IBM Global AI Strategist Mara Pometti is IBM’s first AI Strategist. She defines and designs the strategy for AI solutions by revealing overlooked insights hidden in enterprises’ data. In this episode we speak about strategy, explainable AI and data storytelling.

 

References

IBM Trustworthy AI: https://www.ibm.com/watson/trustworthy-ai

IBM AIX360: https://aix360.mybluemix.net/

Explainable AI and Data Storytelling: https://medium.com/aixdesign/the-next-generation-of-storytelling-1d5fecc8f999

Mara Pometti’s website: www.marapometti.com

Mara Pometti LinkedIn: https://www.linkedin.com/in/mara-pometti-99962594

Mara Pometti Twitter: https://twitter.com/91_pometti 

 

15 Jul 2021Artificial Intelligence for Blockchains with Jonathan Ward CTO of Fetch AI (Ep. 161)00:32:38

In this episode Fetch AI CTO Jonathan Ward speaks about decentralization, AI, blockchain for smart cities and the enterprise. Below some great links about collective learning, smart contracts in Rust and the Fetch AI ecosystem.

17 Aug 2023The new dimension of AI: Vector Databases (Ep. 236)00:27:16

Let's delve into the emerging trend in database design – or is it really a new trend? The realm of vector databases and their revolutionary influence on AI and ML is making headlines. Come along as we investigate how these groundbreaking databases are revolutionizing the landscape of data storage, retrieval, and processing, ultimately unlocking the complete potential of artificial intelligence and machine learning. But are they genuinely as innovative as they seem?

 

References

https://partee.io/2022/08/11/vector-embeddings/

https://blog.det.life/why-you-shouldnt-invest-in-vector-databases-c0cd3f59d23c

https://medium.com/@ryanntk/choosing-the-right-embedding-model-a-guide-for-llm-applications-7a60180d28e3

 

 

10 Mar 2021Concurrent is not parallel - Part 1 (Ep. 142)00:32:10

In plain English, concurrent and parallel are synonyms. Not for a CPU. And definitely not for programmers. In this episode I summarize the ways to parallelize on different architectures and operating systems. Rock-star data scientists must know how concurrency works and when to use it IMHO.

 

Our Sponsors

This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience

 

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.  

 

 

03 May 2023Revolutionize Your AI Game: How Running Large Language Models Locally Gives You an Unfair Advantage Over Big Tech Giants (Ep. 226)00:43:42

This is the first episode about the latest trend in artificial intelligence that's shaking up the industry - running large language models locally on your machine. This new approach allows you to bypass the limitations and constraints of cloud-based models controlled by big tech companies, and take control of your own AI journey.

We'll delve into the benefits of running models locally, such as increased speed, improved privacy and security, and greater customization and flexibility. We'll also discuss the technical requirements and considerations for running these models on your own hardware, and provide practical tips and advice to get you started.

Join us as we uncover the secrets to unleashing the full potential of large language models and taking your AI game to the next level!

Sponsors

AI-powered Email Security Best-in-class protection against the most sophisticated attacks, from phishing and impersonation to BEC and zero-day threats https://www.mimecast.com/

 

 

References

25 Mar 2022Collect data at the edge [RB] (Ep. 192)00:36:25

In this episode I speak with Manavalan Krishnan from Tsecond about capturing massive amounts of data at the edge with security and reliability in mind.

 

This episode is brought to you by NordVPN

NordVPN protects your privacy while you are online. Get secure and private access to the internet by surfing nordvpn.com/DATASCIENCE or use coupon code DATASCIENCE and get a massive discount.

and by Amethix Technologies

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

References

https://tsecond.us/company/manavalan-krishnan/

 

27 Feb 2023Prove It Without Revealing It: Exploring the Power of Zero-Knowledge Proofs in Data Science (Ep. 218)00:15:52

In this episode, we dive into the fascinating world of zero-knowledge proofs and their impact on data science. Zero-knowledge proofs allow one party to prove to another that they know a secret without revealing the secret itself. This powerful concept has numerous applications in data science, from ensuring data privacy and security, to facilitating secure transactions and identity verification. We explore the mechanics of zero-knowledge proofs, its real-world applications, and how it is revolutionizing the way we handle sensitive information. Join us as we uncover the secrets of zero-knowledge proofs and its impact on the future of data science.

 

Sponsors

Want to enjoy the 4K video anytime, anywhere?

With ASUS ZenWiFi you can. Asus ZenWiFi XD5 mesh system puts your WiFi on steroids. It has a super easy Setup, with Flexible Network Naming, Lifelong free AiProtection and of course WiFi 6 technology. With Asus ZenWifi XD5 you get superfast, reliable and secure WiFi connections in every corner of your home! With Asus ZenWifi XD5, you get the best WiFi experience!

Find more at  https://asus.click/ZenWiFi_XD5

07 Sep 2021How are organisations doing with data and AI? (Ep. 168)00:35:34

A few weeks ago I was the guest of a very interesting show called "AI Today". In that episode I talked about some of the biggest trends emerging in AI and machine learning today as well as how organizations are dealing with and managing their data.

 

The original show has been published at https://www.cognilytica.com/2021/08/11/ai-today-podcast-interview-with-francesco-gadaleta-host-of-data-science-at-home-podcast/

 

Our Sponsors

Quantum Metric

Stay off the naughty list this holiday season by reducing customer friction, increasing conversions, and personalizing the shopping experience. Want a sneak peak? Visit us at quantummetric.com/podoffer and see if you qualify to receive our “12 Days of Insights” offer with code DATASCIENCE. This offer gives you 12-day access to our platform coupled with a bespoke insight report that will help you identify where customers are struggling or engaging in your digital product.

 

Amethix Technologies

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

16 Feb 2024Revolutionizing Robotics: Embracing Low-Code Solutions (Ep. 251)00:19:16

In this episode of Data Science at Home, we explore the game-changing impact of low-code solutions in robotics development. Discover how these tools bridge the coding gap, simplify integration, and enable trial-and-error development. We'll also uncover challenges with traditional coding methods using ROS. Join us for a concise yet insightful discussion on the future of robotics!

Sponsors
  • Intrepid AI is an AI assisted all-in-one platform for robotics teams. Build robotics applications in minutes, not months.
  • Learn what the new year holds for ransomware as a service, Active Directory, artificial intelligence and more when you download the 2024 Arctic Wolf Labs Predictions Report today at arcticwolf.com/datascience
22 Feb 2022Connect. Collect. Normalize. Analyze. An interview with the people from Railz AI (Ep. 189)00:46:44

In this episode I am with Pasha Zavari - Director of Data Science and Derek Manuge - Co-founder and CTO at Railz. Railz is a very interesting company with an incredible mission: normalizing and extracting insights from the most tedious data out there, financial data. Guess what technology stack are they on? Enjoy the show!

 

This episode is brought to you by RailzAI

The Railz API connects to major accounting platforms to provide you with quick access to normalized and analyzed financial data.

 

Sponsored by NordVPN

NordVPN protects your privacy while you are online. Get secure and private access to the internet by surfing nordvpn.com/DATASCIENCE or use coupon code DATASCIENCE and get a massive discount.

 

and by Amethix Technologies

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

References

Railz Homepage: https://go.railz.ai/Railz-DSH

Railz API Document: https://go.railz.ai/RailzAPI-DSH

Railz API Signup: https://go.railz.ai/RailzSignup-DSH

Railz Startup Pricing: https://go.railz.ai/RailzStartupPricing-DSH

Railz Careers: https://secure.collage.co/jobs/railz 

26 Jan 2023Chatting with ChatGPT: Pros and Cons of Advanced Language AI (Ep. 215)00:31:27

In this episode, I'll be discussing the capabilities and limitations of ChatGPT, an advanced language AI model. I'll go over its power to understand and respond to natural language, and its applications in tasks such as language translation and text summarization. However, I'll also touch on the challenges that still need to be overcome such as bias and data privacy concerns. Tune in for a comprehensive look at the current state of advanced language AI.

 

References

https://datascienceathome.com/have-you-met-shannon-conversation-with-jimmy-soni-and-rob-goodman-about-one-of-the-greatest-minds-in-history/

 

03 Jun 2022What are generalist agents and why they can change the AI game (Ep. 199)00:21:06
That deep learning alone is not sufficient to solve artificial general intelligence, is more and more accepted statement. Generalist agents have great properties that can overcome some of the limitations of single-task deep learning models. Be aware, we are still far from AGI, though.   So what are generalist agents?   References https://arxiv.org/pdf/2205.06175    
25 Nov 2024Humans vs. Bots: Are You Talking to a Machine Right Now? (Ep. 273)00:49:33

In this episode of Data Science at Home, host Francesco Gadaleta dives deep into the evolving world of AI-generated content detection with experts Souradip Chakraborty, Ph.D. grad student at the University of Maryland, and Amrit Singh Bedi, CS faculty at the University of Central Florida. 

Together, they explore the growing importance of distinguishing human-written from AI-generated text, discussing real-world examples from social media to news. How reliable are current detection tools like DetectGPT? What are the ethical and technical challenges ahead as AI continues to advance? And is the balance between innovation and regulation tipping in the right direction? 

 

Tune in for insights on the future of AI text detection and the broader implications for media, academia, and policy.

 

Chapters 

 

00:00 - Intro 

00:23 - Guests: Souradip Chakraborty and Amrit Singh Bedi 

01:25 - Distinguish Text Generation By AI 

04:33 - Research on Safety and Alignment of Generative Model 

06:01 - Tools to Detect Generated AI Text  

11:28 - Water Marking

18:27 - Challenges in Detecting Large Documents Generated by AI 

23:34 - Number of Tokens 

26:22 - Adversarial Attack

29:01 - True Positive and False Positive of Detectors 

31:01 - Limit of Technologies 

41:01 - Future of AI Detection Techniques 

46:04 - Closing Thought

 

Subscribe to our new YouTube channel https://www.youtube.com/@DataScienceatHome

 

02 Nov 2022[RB] Is studying AI in academia a waste of time? (Ep. 208)00:20:01

Companies and other business entities are actively involved in defining data products and applied research every year. Academia has always played a role in creating new methods and solutions/algorithms in the fields of machine learning and artificial intelligence. However, there is doubt about how powerful and effective such research efforts are. Is studying AI in academia a waste of time?

 

Our Sponsors

Ready to advance your career in data science? University of Cincinnati Online offers nationally recognized educational programs in business analytics and information systems. Predictive Analytics Today named UC as the No.1 MS Data Science school in the country and is nationally recognized with a proven track record of placing students at high-profile companies such as Google, Amazon and P&G.  Discover more about the University of Cincinnati’s 100% online master’s degree programs at online.uc.edu/obais 

 

Amethix works to create and maximize the impact of the world’s leading corporations and startups, so they can create a better future for everyone they serve. We provide solutions in AI/ML, Fintech, Healthcare/RWE, and Predictive maintenance.

 

02 Mar 2021Backend technologies for machine learning in production (Ep. 141)00:25:11

This is one of the most dynamic and fascinating topics: API technologies for machine learning.

It's always fun to build ML models. But how about serving them in the real world? In this episode I speak about three must-know technologies to place your model behind an API.

 

Our Sponsors

This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience

 

If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery.

07 Feb 2020A big welcome to Pryml: faster machine learning applications to production (Ep. 94)00:09:26

Why so much silence? Building a company! That's why :)  I am building pryml, a platform that allows data scientists build their applications on data they cannot get access to.  This is the first of a series of episodes in which I will speak about the technology and the challenges we are facing while we build it. 

Happy listening and stay tuned!

15 Oct 2022Edge AI for applications in military and space (Ep. 206)00:21:09
Our Sponsors

Ready to advance your career in data science? University of Cincinnati Online offers nationally recognized educational programs in business analytics and information systems. Predictive Analytics Today named UC as the No.1 MS Data Science school in the country and is nationally recognized with a proven track record of placing students at high-profile companies such as Google, Amazon and P&G.  Discover more about the University of Cincinnati’s 100% online master’s degree programs at online.uc.edu/obais 

 

Amethix works to create and maximize the impact of the world’s leading corporations and startups, so they can create a better future for everyone they serve. We provide solutions in AI/ML, Fintech, Healthcare/RWE, and Predictive maintenance.

15 Feb 2021How to reinvent banking and finance with data and technology (Ep. 139)00:36:47

The financial system is changing. It is becoming more efficient and integrated with many more services making our life more... digital. Is the old banking system doomed to fail? Or will it just be disrupted by the smaller players of the fintech industry? In this episode we answer some of these fundamental questions with Alessandro E. Hatami from Pacemakers

Subscribe to the Newsletter and come chat with us on the official Discord channel

 

Our Sponsors

This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience

 

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.  

 

08 Jul 2021Apache Arrow, Ballista and Big Data in Rust with Andy Grove RB (Ep. 160)00:29:01

Do you want to know the latest in big data analytics frameworks? Have you ever heard of Apache Arrow? Rust? Ballista? In this episode I speak with Andy Grove one of the main authors of Apache Arrow and Ballista compute engine. Andy explains some challenges while he was designing the Arrow and Ballista memory models and he describes some amazing solutions.

  Our Sponsors

If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery.

 

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

References

 

https://arrow.apache.org/

 

https://ballistacompute.org/

 

https://github.com/ballista-compute/ballista

 

 

 

 

29 Jun 2020Rust and machine learning #4: practical tools (Ep. 110)00:24:18

In this episode I make a non exhaustive list of machine learning tools and frameworks, written in Rust. Not all of them are mature enough for production environments. I believe that community effort can change this very quickly.

To make a comparison with the Python ecosystem I will cover frameworks for linear algebra (numpy), dataframes (pandas), off-the-shelf machine learning (scikit-learn), deep learning (tensorflow) and reinforcement learning (openAI).

Rust is the language of the future. Happy coding! 

Reference
  1. BLAS linear algebra https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
  2. Rust dataframe https://github.com/nevi-me/rust-dataframe
  3. Rustlearn https://github.com/maciejkula/rustlearn
  4. Rusty machine https://github.com/AtheMathmo/rusty-machine
  5. Tensorflow bindings https://lib.rs/crates/tensorflow
  6. Juice (machine learning for hackers) https://lib.rs/crates/juice
  7. Rust reinforcement learning https://lib.rs/crates/rsrl
14 Mar 2020Attacks to machine learning model: inferring ownership of training data (Ep. 99) 00:19:39

In this episode I explain a very effective technique that allows one to infer the membership of any record at hand to the (private) training dataset used to train the target model. The effectiveness of such technique is due to the fact that it works on black-box models of which there is no access to the data used for training, nor model parameters and hyperparameters. Such a scenario is very realistic and typical of machine learning as a service APIs. 

This episode is supported by pryml.io, a platform I am personally working on that enables data sharing without giving up confidentiality. 

 

As promised below is the schema of the attack explained in the episode.

 

 

References

Membership Inference Attacks Against Machine Learning Models

 

 

19 Apr 2020Why average can get your predictions very wrong (ep. 102)00:14:40

Whenever people reason about probability of events, they have the tendency to consider average values between two extremes.  In this episode I explain why such a way of approximating is wrong and dangerous, with a numerical example.

We are moving our community to Slack. See you there!

 

 

24 May 2021MLOps: the good, the bad and the ugly (Ep. 153)00:24:41

Our Sponsor

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

20 Jan 2022Embedded Machine Learning: Part 3 - Network Quantization (Ep. 184)00:25:30

In this episode I speak about neural network quantization, a technique that makes networks feasible for embedded systems and small devices.

There are many quantization techniques depending on several factors that are all important to consider during design and implementation.

Enjoy the episode!

 

Chat with me

Join us on Discord community chat to discuss the show, suggest new episodes and chat with other listeners!

 

Sponsored by Amethix Technologies

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

Links

 

 

 

01 Apr 2020Activate deep learning neurons faster with Dynamic RELU (ep. 101)00:22:18

In this episode I briefly explain the concept behind activation functions in deep learning. One of the most widely used activation function is the rectified linear unit (ReLU).  While there are several flavors of ReLU in the literature, in this episode I speak about a very interesting approach that keeps computational complexity low while improving performance quite consistently.

This episode is supported by pryml.io. At pryml we let companies share confidential data. Visit our website.

Don't forget to join us on discord channel to propose new episode or discuss the previous ones. 

References

Dynamic ReLU https://arxiv.org/abs/2003.10027

19 Dec 2023Open Source Revolution: AI’s Redemption in Data Science (Ep. 247)00:36:47

Dive into the world of Data Science at Home with our latest episode, where we explore the dynamic relationship between Artificial Intelligence and the redemption of open source software. In this thought-provoking discussion, I share my insights on why now, more than ever, is the opportune moment for open source to leave an indelible mark on the field of AI. Join me as I unpack my opinions and set expectations for the near future, discussing the pivotal role open source is set to play in shaping the landscape of data science and artificial intelligence. Don't miss out—tune in to gain a deeper understanding of this revolutionary intersection!

 

This episode is available as YouTube stream at https://www.youtube.com/live/0Enenz1HqIs?si=woyYdjJVz656BneH&t=915

25 Oct 2022Private machine learning done right (Ep. 207)00:26:45

There are many solutions to private machine learning. I am pretty confident when I say that the one we are speaking in this episode is probably one of the most feasible and reliable. I am with Daniel Huynh, CEO of Mithril Security,  a graduate from Ecole Polytechnique with a specialisation in AI and data science. He worked at Microsoft on Privacy Enhancing Technologies under the office of the CTO of Microsoft France. He has written articles on Homomorphic Encryptions with the CKKS explained series (https://blog.openmined.org/ckks-explained-part-1-simple-encoding-and-decoding/). He is now focusing on Confidential Computing at Mithril Security and has written extensive articles on the topic: https://blog.mithrilsecurity.io/

In this show we speak about confidential computing, SGX and private machine learning

 

References

 

29 Aug 2020Testing in machine learning: generating tests and data (Ep. 117)00:20:18

In this episode I speak with Adam Leon Smith, CTO at DragonFly and expert in testing strategies for software and machine learning.

 

On September 15th there will be a live@Manning Rust conference. In one Rust-full day you will attend many talks about what's special about rust, building high performance web services or video game, about web assembly and much more. If you want to meet the tribe, tune in september 15th to the live@manning rust conference.

 

 

11 Dec 2023Money, Cryptocurrencies, and AI: Exploring the Future of Finance with Chris Skinner [RB] (Ep. 246)00:41:21

In this captivating podcast episode, join renowned financial expert Chris Skinner as he delves into the fascinating realm of the future of money. From cryptocurrencies to government currencies, the metaverse to artificial intelligence (AI), Skinner explores the intricate interplay between technology and humanity. Gain valuable insights as he defines the future of money, examines the potential impact of cryptocurrencies on traditional government currencies, and addresses the advantages and disadvantages of digital currencies. Delve into the complex issues of regulation and governance in the context of emerging financial technologies, and discover Skinner's unique perspective on the metaverse and its implications for the future of money and technology. Brace yourself for an enlightening discussion on the integration of AI in the financial sector and its potential impact on humanity. Tune in to explore the cutting-edge concepts that shape our financial landscape and get a glimpse of what lies ahead.

 

You can read about Chris at https://thefinanser.com/

 

Sponsors

This episode is sponsored by Setapp.  Setapp is a platform that combines 230+ powerful MacOS and iOS apps and tools under one $9.99 subscription. Their selection of apps is mostly helpful for people who use their Macs as an actual working tool, covering complete use cases like coding, designing, project and time management and so on. Once subscribed, you get full access to paid features of the apps, as well as to new apps that are being constantly added. So you’ll always be sure you’re not missing out on any cool apps and services that actually help you do your work more efficiently for just a fraction of the price. Get 7 days for free at https://stpp.co/dsat

 

12 Oct 2024What Big Tech Isn’t Telling You About AI (Ep. 267)00:19:15

Are AI giants really building trustworthy systems? A groundbreaking transparency report by Stanford, MIT, and Princeton says no. In this episode, we expose the shocking lack of transparency in AI development and how it impacts bias, safety, and trust in the technology. We’ll break down Gary Marcus’s demands for more openness and what consumers should know about the AI products shaping their lives.

 

Check our new YouTube channel https://www.youtube.com/@DataScienceatHome and Subscribe! 

 

Cool links

  1. https://mitpress.mit.edu/9780262551069/taming-silicon-valley/
  2. http://garymarcus.com/index.html
09 Nov 2023Rolling the Dice: Engineering in an Uncertain World (Ep. 242)00:22:44

Hey there, engineering enthusiasts! Ever wondered how engineers deal with the wild, unpredictable twists and turns in their projects? In this episode, we're spilling the beans on uncertainty and why it's the secret sauce in every engineering recipe, not just the fancy stuff like deep learning and neural networks!

Join us for a ride through the world of uncertainty quantification. Tune in and let's demystify the unpredictable together! 🎲🔧🚀

 

References

https://www.osti.gov/servlets/purl/1428000

https://arc.aiaa.org/doi/pdf/10.2514/6.2010-124

https://arxiv.org/pdf/2001.10411

 

 

31 Aug 2021Don't fight! Cooperate. Generative Teaching Networks (Ep. 167)00:16:03

Remember GANs? Generative Adversarial Networks for synthetic data generation?  There is a new method called Generative Teaching Networks, that uses similar concepts - just quite the opposite :P - to train models faster, better and with less data.

Enjoy the show!

  Our Sponsors

Quantum Metric

Stay off the naughty list this holiday season by reducing customer friction, increasing conversions, and personalizing the shopping experience. Want a sneak peak? Visit us at quantummetric.com/podoffer and see if you qualify to receive our “12 Days of Insights” offer with code DATASCIENCE. This offer gives you 12-day access to our platform coupled with a bespoke insight report that will help you identify where customers are struggling or engaging in your digital product.

 

Amethix Technologies

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

References
  1. Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data https://arxiv.org/abs/1912.07768

 

19 Oct 2021Fighting Climate Change as a Technologist (Ep. 172)00:19:43

The content of this episode has been created by Sylvain Kerkour  Feel free to subscribe to his newsletter at https://kerkour.com

 

Projects worth considering
09 Oct 2023Elon is right this time: Rust is the language of AI (Ep. 240)00:24:05

In this episode, I delve into Elon Musk's foresight on the future of AI as he champions Rust programming language. Here is why Rust stands at the forefront of AI technology and the potential it holds.

References

https://github.com/WasmEdge/mediapipe-rs

https://blog.stackademic.com/why-did-elon-musk-say-that-rust-is-the-language-of-agi-eb36303ce341

 

15 Feb 2023[RB] Online learning is better than batch, right? Wrong! (Ep. 216)00:29:08

In this episode I speak about online learning systems and why blindly choosing such a paradigm can lead to very unpredictable and expensive outcomes. Also in this episode, I have to deal with an intruder :)

 

 

Links

Birman, K.; Joseph, T. (1987). "Exploiting virtual synchrony in distributed systems". Proceedings of the Eleventh ACM Symposium on Operating Systems Principles - SOSP '87. pp. 123–138. doi:10.1145/41457.37515. ISBN 089791242X. S2CID 7739589.

 

30 Mar 2023The promise and pitfalls of GPT-4 (Ep. 221)00:29:38

In this episode, we explore the potential of the highly anticipated GPT-4 language model and the challenges that come with its development. From its ability to generate highly coherent and creative text to concerns about ethical considerations and the potential misuse of such technology, we delve into the promise and pitfalls of GPT-4. Join us as we speak with experts in the field to gain insights into the latest developments and the impact that GPT-4 could have on the future of natural language processing.

 

 

08 Jan 2024Careers, Skills, and the Evolution of AI (Ep. 248)00:32:27

!!WARNING!!

Due to some technical issues the volume is not always constant during the show. I sincerely apologise for any inconvenience Francesco

 

 

In this episode, I speak with Richie Cotton, Data Evangelist at DataCamp, as he delves into the dynamic intersection of AI and education. Richie, a seasoned expert in data science and the host of the podcast, brings together a wealth of knowledge and experience to explore the evolving landscape of AI careers, the skills essential for generative AI technologies, and the symbiosis of domain expertise and technical skills in the industry.

 

References

 

 

23 Mar 2020WARNING!! Neural networks can memorize secrets (ep. 100)00:24:16

One of the best features of neural networks and machine learning models is to memorize patterns from training data and apply those to unseen observations. That's where the magic is.  However, there are scenarios in which the same machine learning models learn patterns so well such that they can disclose some of the data they have been trained on. This phenomenon goes under the name of unintended memorization and it is extremely dangerous.

Think about a language generator that discloses the passwords, the credit card numbers and the social security numbers of the records it has been trained on. Or more generally, think about a synthetic data generator that can disclose the training data it is trying to protect. 

In this episode I explain why unintended memorization is a real problem in machine learning. Except for differentially private training there is no other way to mitigate such a problem in realistic conditions. At Pryml we are very aware of this. Which is why we have been developing a synthetic data generation technology that is not affected by such an issue.

 

This episode is supported by Harmonizely.  Harmonizely lets you build your own unique scheduling page based on your availability so you can start scheduling meetings in just a couple minutes. Get started by connecting your online calendar and configuring your meeting preferences. Then, start sharing your scheduling page with your invitees!

 

References

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks https://www.usenix.org/conference/usenixsecurity19/presentation/carlini

11 May 2021MLOps: what is and why it is important (Ep. 151)00:33:04

If you think that knowing Tensorflow and Scikit-learn is enough, think again. MLOps is one of those trendy terms today. What is MLOps and why is it important? In this episode I speak about the undeniable evolution of the data scientist in the last 5-10 years.

Sponsors

If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery.

 

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

 

 

 

 

 

13 Dec 2022Edge AI applications for military and space [RB] (Ep. 213)00:20:59
Our Sponsors

NordPass Business has developed a password manager, that will save you a lot of time and energy whenever you  need access to business accounts, work across devices, even with the other members of your team, or whenever you need to share sensitive data with your colleagues, or make payments efficiently. All this with the highest standard of cyber secure technology.

See NordPass Business in action now with a 3-month free trial here https://nordpass.com/DATASCIENCE with code DATASCIENCE

 

 

Amethix works to create and maximize the impact of the world’s leading corporations and startups, so they can create a better future for everyone they serve. We provide solutions in AI/ML, Fintech, Healthcare/RWE, and Predictive maintenance.

 

11 Jun 2024Harnessing AI for Cybersecurity: Expert Tips from QFunction (Ep. 258)00:34:33

In this episode, we sit down with Ryan Smith, Founder of QFunction LLC, to explore how AI and machine learning are revolutionizing cybersecurity. With over 8 years of experience, including work at NASA's Jet Propulsion Laboratory, Ryan shares insights on the future of threat detection and prevention, the challenges businesses face in maintaining effective cybersecurity, and the ethical considerations of AI implementation. Learn about cost-effective strategies for small businesses, the importance of collaboration in combating cyber threats, and how QFunction tailors its AI solutions to meet diverse industry needs.

 

Sponsors
  • Arctic Wolf Learn what the new year holds for ransomware as a service, Active Directory, artificial intelligence and more when you download the 2024 Arctic Wolf Labs Predictions Report today at arcticwolf.com/datascience
  • Intrepid AI (https://intrepid.ai) is an AI assisted all-in-one platform for robotics teams. Build robotics applications in minutes, not months.
  • QFunction  does cybersecurity differently. By relying on scientific breakthroughs in AI and machine learning, QFunction works within your existing security stack to detect anomalies and threats within your data
  References
08 Aug 2024Data Guardians: How Enterprises Can Master Privacy with MetaRouter (Ep. 261)00:32:27

In this insightful episode, we dive deep into the pressing issue of data privacy, where 86% of U.S. consumers express growing concerns and 40% don't trust companies to handle their data ethically. Join us as we chat with the Vice President of Engineering at MetaRouter, a cutting-edge platform enabling enterprises to regain control over their customer data. We explore how MetaRouter empowers businesses to manage data in a 1st-party context, ensuring ethical, compliant handling while navigating the complexities of privacy regulations.

 

Sponsors

 

References

21 Jan 2024OpenAI CEO Shake-up: Decoding December 2023 (Ep. 249)00:27:30

In this episode from a month ago, join me as we unravel the controversial CEO firing at OpenAI in December 2023. I share my insights on the events, decode the intricacies, and explore what lies ahead for this influential organization. Don't miss this concise yet insightful take on the intersection of leadership and artificial intelligence innovation.

 

 

Sponsor

Learn what the new year holds for ransomware as a service, Active Directory, artificial intelligence and more when you download the 2024 Arctic Wolf Labs Predictions Report today at arcticwolf.com/datascience

02 Sep 2024AI: The Bubble That Might Pop—What’s Next? (Ep. 262)00:26:04

The hype around Generative AI is real, but is the bubble about to burst? Join me as we dissect the recent downturn in AI investments and what it means for the tech giants like OpenAI and Nvidia. Could this be the end of the AI gold rush, or just a bump in the road?

 

References

 

 

26 Sep 2020Why synthetic data cannot boost machine learning (Ep. 120)00:23:23

Come join me in our Discord channel speaking about all things data science.

Follow me on Twitch during my live coding sessions usually in Rust and Python

This episode is supported by Women in Tech by Manning Conferences

03 Nov 2020Remove noise from data with deep learning (Ep.125)00:23:59

Come join me in our Discord channel speaking about all things data science.

Follow me on Twitch during my live coding sessions usually in Rust and Python

Our Sponsors
  • ProtonMail is a secure and private email provider that protects yourmessages with end-to-end encryption and zero-access encryption so that besides you, noone can access them.
  • Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.
References
30 Oct 2020What is contrastive learning and why it is so powerful? (Ep. 124)00:26:12

Come join me in our Discord channel speaking about all things data science.

Follow me on Twitch during my live coding sessions usually in Rust and Python

Our Sponsors
  • The Monday Apps Challenge is bringing developers around the world together to compete in order to build apps that can improve the way teams work together on monday.com
  • Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

References

A Simple Framework for Contrastive Learning of Visual Representations

 

25 May 2023AI’s Impact on Software Engineering: Killing Old Principles? [RB] (Ep. 229)00:13:54

In this episode, we dive into the ways in which AI and machine learning are disrupting traditional software engineering principles. With the advent of automation and intelligent systems, developers are increasingly relying on algorithms to create efficient and effective code. However, this reliance on AI can come at a cost to the tried-and-true methods of software engineering. Join us as we explore the pros and cons of this paradigm shift and discuss what it means for the future of software development.

 

Sponsors Bloomberg

At Bloomberg, they solve complex, real-world problems for customers across the global capital markets. From real-time market data to sophisticated analytics, powerful trading tools, and more, Bloomberg engineers work with systems that operate at scale. If you're a software engineer looking for an exciting and fulfilling career, head over to bloomberg.com/careers to learn more.

  Arctic Wolf

Cybercriminals are evolving. Their techniques and tactics are more advanced, intricate, and dangerous than ever before. Industries and governments around the world are fighting back, unveiling new regulations meant to better protect data against this rising threat. Arctic Wolf — the leader in security operations — is on a mission to end cyber risk by giving organizations the protection, information, and confidence they need to protect their people, technology, and data. Visit arcticwolf.com/datascience to take your first step.

17 Jun 2020Rust and machine learning #1 (Ep. 107)00:22:27

This is the first episode of a series about the Rust programming language and the role it can play in the machine learning field.

Rust is one of the most beautiful languages I have ever studied so far. I personally come from the C programming language, though for professional activities in machine learning I had to switch to the loved and hated Python language.

This episode is clearly not providing you with an exhaustive list of the benefits of Rust, nor its capabilities. For this you can check the references and start getting familiar with what I think it's going to be the language of the next 20 years.

 

Sponsored

This episode is supported by Pryml Technologies. Pryml offers secure and cost effective data privacy solutions for your organisation. It generates a synthetic alternative without disclosing you confidential data.

 

References

 

08 Feb 2022Artificial Intelligence and Cloud Automation with Leon Kuperman from Cast.ai (Ep. 187)00:40:28

In this episode I speak about AI and cloud automation with Leon Kuperman, co-founder and CTO at CAST AI. Formerly Vice President of Security Products OCI at Oracle, Leon’s professional experience spans across tech companies such as IBM, Truition, and HostedPCI.

Enjoy the episode!

 

Chat with me

Join us on Discord community chat to discuss the show, suggest new episodes and chat with other listeners!

 

Sponsored by Amethix Technologies

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

Sponsored by NordVPN

NordVPN protects your privacy while you are online. Get secure and private access to the internet by surfing nordvpn.com/DATASCIENCE or use coupon code DATASCIENCE and get a massive discount.

 

References

 

 

10 May 2024Rust in the Cosmos Part 3: Embedded programming for space (Ep. 256)00:44:36

In this episode of "Rust in the Cosmos" we delve into the challenges of building embedded applications for space. Did you know that once you ship your app to space... you can't get it back? :P

What role is Rust playing here? Let's find out ;)

 

Sponsors
  • Arctic Wolf Learn what the new year holds for ransomware as a service, Active Directory, artificial intelligence and more when you download the 2024 Arctic Wolf Labs Predictions Report today at arcticwolf.com/datascience
  • Intrepid AI (https://intrepid.ai) is an AI assisted all-in-one platform for robotics teams. Build robotics applications in minutes, not months.
  • Amethix works to create and maximize the impact of the world’s leading corporations and startups, so they can create a better future for everyone they serve. We provide solutions in AI/ML, Fintech, Defense, Robotics and Predictive maintenance.
  Communities

AeroRust, Intrepid, Bytenook

  References
30 Jan 2024Is SQream the fastest big data platform? (Ep. 250)00:57:16

Join us in a dynamic conversation with Yori Lavi, Field CTO at SQream, as we unravel the data analytics landscape. From debunking the data lakehouse hype to SQream's GPU-based magic, discover how extreme data challenges are met with agility. Yori shares success stories, insights into SQream's petabyte-scale capabilities, and a roadmap to breaking down organizational bottlenecks in data science. Dive into the future of data analytics with SQream's commitment to innovation, leaving legacy formats behind and leading the charge in large-scale, cost-effective data projects. Tune in for a dose of GPU-powered revolution!

 

References

27 May 2022Streaming data with ease. With Chip Kent from Deephaven Data Labs (Ep. 198)00:23:48

In this episode, I am with Chip Kent, chief data scientist at Deephaven Data Labs.

We speak about streaming data, real-time, and other powerful tools part of the Deephaven platform.

 

Links

GitHub:

YouTube Channel - https://www.youtube.com/channel/UCoaYOlkX555PSTTJz8ZaI_w

Blog posts 

Careers https://deephaven.io/company/careers/

Community Slack http://deephaven.io/slack

01 Apr 2022Batteries and AI in Automotive (Ep. 193)00:37:17

In this episode my friend and I speak about AI, batteries and automotive. Dennis Berner, founder of Digitlabs has been operating in the field of automotive and batteries for a long time. His point of views are absolutely a must to listen to. Below a list of the links he mentioned in the show.

  1. https://amethix.com
  2. https://digitlabs.com
  3. https://www.moia.io
  4. https://www.elli.eco
  5. https://www.uber.com
  6. https://www.didiglobal.com/
  7. https://waymo.com/
  8. https://group.mercedes-benz.com/
  9. https://www.fakultaet73.de
  10. https://www.bmw.de
  11. https://www.volkswagen.de
  12. https://cariad.technology/

 

08 Dec 2020A Standard for the Python Array API (Ep. 132)00:33:40
Our Links

Come join me in our Discord channel speaking about all things data science.

Subscribe to the official Newsletter and never miss an episode

Follow me on Twitch during my live coding sessions usually in Rust and Python

Our Sponsors
  • ProtonMail offers a simple and trusted solution to protect your internet connection and access blocked or restricted websites. All of ProtonMail and ProtonVPN’s apps are open source and have been inspected by cybersecurity experts, and Proton is based in Switzerland, home to some of the world’s strongest privacy laws
  • Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.
References
  1. https://data-apis.org/blog/announcing_the_consortium
  2. https://data-apis.github.io/array-api/latest/
  3. https://github.com/data-apis/python-record-api
15 Jan 2022Embedded Machine Learning: Part 2 (Ep. 183)00:15:49

In Part 2 of Embedded Machine Learning, I speak about one important technique to prune a neural network and perform inference on small devices. Such technique helps preserving most of the accuracy with a model orders of magnitude smaller.

Enjoy the show!

 

 

References
  1. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

 

01 Feb 2021Is Rust flexible enough for a flexible data model? (Ep. 137)00:28:37

In this podcast I get inspired by Paul Done's presentation about The Six Principles for Building Robust Yet Flexible Shared Data Applications, and show how powerful of a language Rust is while still maintaining the flexibility of less strict languages.

 

Our Sponsor

This episode is supported by Chapman’s Schmid College of Science and Technology, where master's and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience

 

16 Sep 2020Machine learning in production: best practices [LIVE from twitch.tv] (Ep. 119)00:37:31

Hey there! Having the best time of my life ;)

This is the first episode I record while I am live on my new Twitch channel :) So much fun!

Feel free to follow me for the next live streaming. You can also see me coding machine learning stuff in Rust :))

Don't forget to jump on the usual Discord and have a chat

I'll see you there!

 

 

 

 

07 Mar 2024Kaggle Kommando's Data Disco: Laughing our Way Through AI Trends (Ep. 252)00:42:46

In this episode, join me and the Kaggle Grand Master, Konrad Banachewicz, for a hilarious journey into the zany world of data science trends. From algorithm acrobatics to AI, creativity, Hollywood movies, and music, we just can't get enough. It's the typical episode with a dose of nerdy comedy you didn't know you needed. Buckle up, it's a data disco, and we're breaking down the binary!

 

Sponsors
  • Intrepid AI is an AI assisted all-in-one platform for robotics teams. Build robotics applications in minutes, not months.
  • Learn what the new year holds for ransomware as a service, Active Directory, artificial intelligence and more when you download the 2024 Arctic Wolf Labs Predictions Report today at arcticwolf.com/datascience

 

🔗 Links Mentioned in the Episode:

  1. Generative AI for time series: TimeGPT Documentation
  2. Lag-llama: GitHub (Note: The benchmark results on this one are pretty horrible)
  3. Open source LLM: Olmo Blog Post
  4. Quantization for LLM: Hugging Face Guide

And finally, don't miss Konrad's Substack for more nerdy goodness! (If you're there already, be there again! 😄)

07 Nov 2020Top-3 ways to put machine learning models into production (Ep. 126)00:20:27

Come join me in our Discord channel speaking about all things data science.

Follow me on Twitch during my live coding sessions usually in Rust and Python

Our Sponsors
  • physicspodcast.com is not just a physics podcast. But also interviews with scientists, scholars, authors and reflections on the history and future of science and technology are all in the wheelhouse.
  • Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.
26 Jun 2023Debunking AGI Hype and Embracing Reality (Ep. 233)00:59:29

In this thought-provoking episode, we sit down with the renowned AI expert, Filip Piekniewski, Phd, who fearlessly challenges the prevailing narratives surrounding artificial general intelligence (AGI) and the singularity. With a no-nonsense approach and a deep understanding of the field, Filip dismantles the hype and exposes some of the misconceptions about AI, LLMs and AGI. Join us as we delve into the real-world implications of AI, separating fact from fiction, and gaining a firm grasp on the tangible possibilities of AI advancement. If you're seeking a refreshingly pragmatic perspective on the future of AI, this episode is an absolute must-listen.

 

Filip Piekniewski Bio

Filip Piekniewski is a distinguished computer vision researcher and engineer, specializing in visual object tracking and perception. He approaches machine learning with a pragmatic mindset, recognizing its current limitations. Filip earned his Ph.D. from Warsaw University, where he explored neuroscience and later joined Brain Corporation in San Diego. His extensive study of neuroscience inspired him to develop innovative, bio-inspired machine learning architectures. Filip's unique blend of scientific curiosity and software engineering expertise allows him to quickly prototype and implement new ideas. He is known for his realistic perspective on AI, debunking AGI hype and focusing on tangible advancements.

 

Sponsors

  • Finally, a better way to do B2B research. NewtonX The World’s Leading B2B Market Research Company

  • Explore the Complex World of Regulations. Compliance can be overwhelming. Multiple frameworks. Overlapping requirements. Let Arctic Wolf be your guide. Check it out at https://arcticwolf.com/datascience

  • Amethix works to create and maximize the impact of the world’s leading corporations and startups, so they can create a better future for everyone they serve. We provide solutions in AI/ML, Fintech, Defense, Robotics and Predictive maintenance.

 

References

  1. https://twitter.com/filippie509
  2. http://blog.piekniewski.info/ (On limits of deep learning and where to go next with AI.)

 

 

11 Nov 2020Machine Learning in Rust: Amadeus with Alec Mocatta [RB] (ep. 127)00:24:19

Come join me in our Discord channel speaking about all things data science.

Follow me on Twitch during my live coding sessions usually in Rust and Python

Our Sponsors
  • ProtonVPN offers a simple and trusted solution to protect your internet connection and access blocked or restricted websites. All of ProtonVPN’s apps are open source and have been inspected by cybersecurity experts, and Proton is based in Switzerland, home to some of the world's strongest privacy laws
  • Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.
19 Apr 2024Rust in the Cosmos Part 2: testing software in space (Ep. 255)00:32:50

In this episode of "Rust in the Cosmos" we delve into the challenge of testing software for... ehm ... space

How can Rust help? Let's find out ;)

 

Sponsors
  • Arctic Wolf Learn what the new year holds for ransomware as a service, Active Directory, artificial intelligence and more when you download the 2024 Arctic Wolf Labs Predictions Report today at arcticwolf.com/datascience
  • Intrepid AI (https://intrepid.ai) is an AI assisted all-in-one platform for robotics teams. Build robotics applications in minutes, not months.
  • Amethix works to create and maximize the impact of the world’s leading corporations and startups, so they can create a better future for everyone they serve. We provide solutions in AI/ML, Fintech, Defense, Robotics and Predictive maintenance.
  Communities

AeroRust, Intrepid, Bytenook

 

References

 

01 Jun 2024Rust in the Cosmos Part 4: What happens in space? (Ep. 257)00:29:24

In this last episode of the series "Rust in the Cosmos" we speak about what happens in space, what projects are currently active and what happened in the past that we can learn from?

What about Rust and space applications? As always,  let's find out ;)

 

Sponsors
  • Arctic Wolf Learn what the new year holds for ransomware as a service, Active Directory, artificial intelligence and more when you download the 2024 Arctic Wolf Labs Predictions Report today at arcticwolf.com/datascience
  • Intrepid AI (https://intrepid.ai) is an AI assisted all-in-one platform for robotics teams. Build robotics applications in minutes, not months.
  • Amethix works to create and maximize the impact of the world’s leading corporations and startups, so they can create a better future for everyone they serve. We provide solutions in AI/ML, Fintech, Defense, Robotics and Predictive maintenance.
  Communities

Intrepid AI, AeroRust, Bytenook

  References
07 Feb 2021What's up with WhatsApp? (Ep. 138)00:30:44

Have you clicked the button? Accepted the new terms?

It's time we have a talk.

18 Sep 2023Attacking LLMs for fun and profit (Ep. 239)00:22:14

As a continuation of Episode 238, I explain some effective and fun attacks to conduct against LLMs. Such attacks are even more effective on models served locally, that are hardly controlled by human feedback.

Have great fun and learn them responsibly.

 

References

https://www.jailbreakchat.com/

https://www.reddit.com/r/ChatGPT/comments/10tevu1/new_jailbreak_proudly_unveiling_the_tried_and/

https://arxiv.org/abs/2305.13860

 

14 Mar 2023AI’s Impact on Software Engineering: Killing Old Principles? (Ep. 220)00:13:26

In this episode, we dive into the ways in which AI and machine learning are disrupting traditional software engineering principles. With the advent of automation and intelligent systems, developers are increasingly relying on algorithms to create efficient and effective code. However, this reliance on AI can come at a cost to the tried-and-true methods of software engineering. Join us as we explore the pros and cons of this paradigm shift and discuss what it means for the future of software development.

14 Sep 2021Send compute to data with POSH data-aware shell (Ep. 169)00:22:08
Our Sponsors

Quantum Metric

Stay off the naughty list this holiday season by reducing customer friction, increasing conversions, and personalizing the shopping experience. Want a sneak peak? Visit us at quantummetric.com/podoffer and see if you qualify to receive our “12 Days of Insights” offer with code DATASCIENCE. This offer gives you 12-day access to our platform coupled with a bespoke insight report that will help you identify where customers are struggling or engaging in your digital product.

 

Amethix Technologies

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

References

Paper https://deeptir.me/papers/posh-atc20.pdf

Code https://github.com/deeptir18/posh

11 May 2023Efficiently Retraining Language Models: How to Level Up Without Breaking the Bank (Ep. 227)00:33:50

Get ready for an eye-opening episode! 🎙️

In our latest podcast episode, we dive deep into the world of LoRa (Low-Rank Adaptation) for large language models (LLMs). This groundbreaking technique is revolutionizing the way we approach language model training by leveraging low-rank approximations.

Join us as we unravel the mysteries of LoRa and discover how it enables us to retrain LLMs with minimal expenditure of money and resources. We'll explore the ingenious strategies and practical methods that empower you to fine-tune your language models without breaking the bank.

Whether you're a researcher, developer, or language model enthusiast, this episode is packed with invaluable insights. Learn how to unlock the potential of LLMs without draining your resources.

Tune in and join the conversation as we unravel the secrets of LoRa low-rank adaptation and show you how to retrain LLMs on a budget.

Listen to the full episode now on your favorite podcast platform! 🎧✨

  References

 

08 Nov 2022Evolution of data platforms (Ep. 209)00:17:42

Let's look at the history of data platforms. How did they evolve? Why? Shall I switch to the latest architecture? Enjoy the show!

 

Our Sponsors

Explore the Complex World of Regulations. Compliance can be overwhelming. Multiple frameworks. Overlapping requirements. Let Arctic Wolf be your guide. Check it out at https://arcticwolf.com/datascience

 

Amethix works to create and maximize the impact of the world’s leading corporations and startups, so they can create a better future for everyone they serve. We provide solutions in AI/ML, Fintech, Healthcare/RWE, and Predictive maintenance.

11 Apr 2021You are the product [RB] (Ep. 147)00:45:04
In this episode I am with George Hosu from Cerebralab and we speak about how dangerous it is not to pay for the services you use, and as a consequence how dangerous it is letting an algorithm decide what you like or not.   Our Sponsors

This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience

 

If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery.

 

Links
16 Feb 2022History of data science [RB] (Ep. 188)00:39:54

How did we get here? Who invented the methods data scientists use every day?

We answer such questions and much more in this wonderful episode with Triveni Gandhi, Senior Data Scientist and Shaun McGirr, AI Evangelist at Dataiku. We cover topics about the history of data science, ethical AI and...

 

This episode is brought to you by Dataiku

With Dataiku, you have everything you need to build and deploy AI projects in one place, including easy-to-use data preparation and pipelines, AutoML, and advanced automation.

 

Sponsored by NordVPN

NordVPN protects your privacy while you are online. Get secure and private access to the internet by surfing nordvpn.com/DATASCIENCE or use coupon code DATASCIENCE and get a massive discount.

 

and by Amethix Technologies

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

References

www.historyofdatascience.com

 

09 Mar 2023Edge AI applications for military and space [RB] (Ep. 219)00:20:59
16 Nov 2021Do you fear of AI? Why? (Ep. 176)00:20:18

This episode summarizes a study about trends of AI in 2021, the way AI is perceived by people of different background and some other weird questions.

For instance, would you have sexual intercourse with a robot? Would you be in a relationship with an artificial intelligence?

The study has been conducted by Tidio.com and reported at https://www.tidio.com/blog/ai-trends/

 

 

Sponsors

This episode is supported by Amethix Technologies. Amethix uses machine learning and advanced analytics to empower people and organizations to ask and answer complex questions like never before. Coming soon at https://amethix.com

08 Apr 2021Polars: the fastest dataframe crate in Rust - with Ritchie Vink (Ep. 146)00:32:52

In this episode I speak with Ritchie Vink, the author of Polars, a crate that is the fastest dataframe library at date of speaking :) If you want to participate to an amazing Rust open source project, this is your change to collaborate to the official repository in the references.

 

References

https://github.com/ritchie46/polars

 

21 Oct 2024AI Says It Can Compress Better Than FLAC?! Hold My Entropy 🍿 (Ep. 268)00:21:05

Can AI really out-compress PNG and FLAC? 🤔 Or is it just another overhyped tech myth? In this episode of Data Science at Home, Frag dives deep into the wild claims that Large Language Models (LLMs) like Chinchilla 70B are beating traditional lossless compression algorithms. 🧠💥

But before you toss out your FLAC collection, let's break down Shannon's Source Coding Theorem and why entropy sets the ultimate limit on lossless compression.

We explore: ⚙️ How LLMs leverage probabilistic patterns for compression 📉 Why compression efficiency doesn’t equal general intelligence 🚀 The practical (and ridiculous) challenges of using AI for compression 💡 Can AI actually BREAK Shannon’s limit—or is it just an illusion?

If you love AI, algorithms, or just enjoy some good old myth-busting, this one’s for you. Don't forget to hit subscribe for more no-nonsense takes on AI, and join the conversation on Discord!

Let’s decode the truth together. Join the discussion on the new Discord channel of the podcast https://discord.gg/4UNKGf3

 

Don't forget to subscribe to our new YouTube channel 

https://www.youtube.com/@DataScienceatHome

 

 

References

Have you met Shannon? https://datascienceathome.com/have-you-met-shannon-conversation-with-jimmy-soni-and-rob-goodman-about-one-of-the-greatest-minds-in-history/

 

 

16 Dec 2024Autonomous Weapons and AI Warfare (Ep. 275)00:17:43

Here’s the updated text with links to the websites included:

AI is revolutionizing the military with autonomous drones, surveillance tech, and decision-making systems. But could these innovations spark the next global conflict? In this episode of Data Science at Home, we expose the cutting-edge tech reshaping defense—and the chilling ethical questions that follow. Don’t miss this deep dive into the AI arms race!

🎧 LISTEN / SUBSCRIBE TO THE PODCAST

Chapters 00:00 - Intro 01:54 - Autonomous Vehicles 03:11 - Surveillance And Reconnaissance 04:15 - Predictive Analysis 05:57 - Decision Support System 08:24 - Real World Examples 10:42 - Ethical And Strategic Considerations 12:25 - International Regulation 13:21 - Conclusion 14:50 - Outro

✨ Connect with us!

🎥Youtube: https://www.youtube.com/@DataScienceatHome 📩 Newsletter: https://datascienceathome.substack.com 🎙 Podcast: Available on Spotify, Apple Podcasts, and more. 🐦 Twitter: @DataScienceAtHome 📘 LinkedIn: Francesco Gad 📷 Instagram: https://www.instagram.com/datascienceathome/ 📘 Facebook: https://www.facebook.com/datascienceAH 💼 LinkedIn: https://www.linkedin.com/company/data-science-at-home-podcast 💬 Discord Channel: https://discord.gg/4UNKGf3

NEW TO DATA SCIENCE AT HOME? Welcome! Data Science at Home explores the latest in AI, data science, and machine learning. Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions. Learn more at https://datascienceathome.com.

📫 SEND US MAIL! We love hearing from you! Send us mail at: hello@datascienceathome.com

Don’t forget to like, subscribe, and hit the 🔔 for updates on the latest in AI and data science!

#DataScienceAtHome #ArtificialIntelligence #AI #MilitaryTechnology #AutonomousDrones #SurveillanceTech #AIArmsRace #DataScience #DefenseInnovation #EthicsInAI #GlobalConflict #PredictiveAnalysis #AIInWarfare #TechnologyAndEthics #AIRevolution #MachineLearning

 

28 Sep 2022LIDAR, cameras and autonomous vehicles (Ep. 204)00:19:56

How does an autonomous vehicle see? How does it sense the road? They are equipped of many sensors, of course. Are they all powerful enough? Small enough to hide them and make your car look beautiful?  In this episode I speak about LIDAR, high resolution cameras and some machine learning methods adapted to a minimal number of sensors.

 

Our Sponsors

Ready to advance your career in data science? University of Cincinnati Online offers nationally recognized educational programs in business analytics and information systems. Predictive Analytics Today named UC as the No.1 MS Data Science school in the country and is nationally recognized with a proven track record of placing students at high-profile companies such as Google, Amazon and P&G.  Discover more about the University of Cincinnati’s 100% online master’s degree programs at online.uc.edu/obais 

 

Amethix works to create and maximize the impact of the world’s leading corporations and startups, so they can create a better future for everyone they serve. We provide solutions in AI/ML, Fintech, Healthcare/RWE, and Predictive maintenance.

 

References

https://patents.google.com/patent/US20220043449A1/en?oq=20220043449

 

 

22 Feb 2021You are the product (Ep. 140)00:45:04
In this episode I am with George Hosu from Cerebralab and we speak about how dangerous it is not to pay for the services you use, and as a consequence how dangerous it is letting an algorithm decide what you like or not.   Our Sponsors

This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience

 

If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery.

 

Links
21 Apr 2022Improving your AI by finding issues within data pockets (Ep. 195)00:33:03

In this episode I have a conversation with, Itai Bar-Sinai, CPO & Cofounder of Mona.

We speak about several interesting points about data and monitoring. Why is AI monitoring so different from monitoring classic software? How to reduce the gap between data science and business? What is the role of MLOps in the data monitoring field?

With over 10 years of experience with AI and as the CPO and head of customer success at Mona, the leading AI monitoring intelligence company, Itai has a unique view of the AI industry. Working closely with data science and ML teams applying dozens of AI solutions in over 10 industries, Itai encounters the wide variety of business use-cases, organizational structures and cultures, and technologies and tools used in today’s AI world.

 

References

https://www.monalabs.io

 

30 Nov 2021What is a data mesh and why it is relevant (Ep. 178)00:16:05
Sponsors

This episode is brought to you by Amethix Technologies

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

Join us on Discord

Feel free to drop by and have a chat with the host and the followers of the show

 

 

17 Aug 2021Reinforcement Learning is all you need. Or is it? (Ep. 165)00:30:28

Is reinforcement learning sufficient to build truly intelligent machines? Listen to this episode to find out.

Our Sponsors

Quantum Metric

Stay off the naughty list this holiday season by reducing customer friction, increasing conversions, and personalizing the shopping experience. Want a sneak peak? Visit us at quantummetric.com/podoffer and see if you qualify to receive our “12 Days of Insights” offer with code DATASCIENCE. This offer gives you 12-day access to our platform coupled with a bespoke insight report that will help you identify where customers are struggling or engaging in your digital product.

 

Amethix Technologies

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

 

References

 

01 Dec 2023Destroy your toaster before it kills you. Drama at OpenAI and other stories (Ep. 244)00:26:01

Brace yourselves, dear friends! In this episode, we delve into the earth-shattering revelation that OpenAI might have stumbled upon AGI (lol) and we're all just seconds away from being replaced by highly sophisticated toasters (lol lol). Spoiler alert: OpenAI's CEO is just playing 7D chess with the entire human race. So, sit back, relax, and enjoy this totally not ominous exploration into the 'totally not happening' future of AI!

21 Nov 2022Autonomous cars cannot drive. Here is why. (Ep. 210)00:35:04

If you think that the problem of self-driving cars has been solved, think twice. As a matter of fact, the problem of self-driving cars cannot be solved with the technical solutions that companies are currently considering. Don't get fooled by marketing and PR on social media. Whoever is telling you they solved the problem of driving a vehicle fully autonomously, they are lying. Here is why.

 

Our Sponsors

Explore the Complex World of Regulations. Compliance can be overwhelming. Multiple frameworks. Overlapping requirements. Let Arctic Wolf be your guide. Check it out at https://arcticwolf.com/datascience

 

Amethix works to create and maximize the impact of the world’s leading corporations and startups, so they can create a better future for everyone they serve. We provide solutions in AI/ML, Fintech, Healthcare/RWE, and Predictive maintenance.

19 May 2021MLOps: what is and why it is important Part 2 (Ep. 152)00:30:37

Our Sponsor

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

20 May 2020Compressing deep learning models: distillation (Ep.104)00:22:19

Using large deep learning models on limited hardware or edge devices is definitely prohibitive. There are methods to compress large models by orders of magnitude and maintain similar accuracy during inference.

In this episode I explain one of the first methods: knowledge distillation

 Come join us on Slack

Reference
09 Nov 2021Composable models and artificial general intelligence (Ep. 175)00:16:37

If you think deep learning is a method to get to AGI, think again. Humans, as well as all mammals think in a... composable way.

 

Sponsors

This episode is brought to you by Advanced RISC Machines (ARM). ARM is a family of reduced instruction set computing architectures for computer processors https://www.arm.com/

 

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

22 Feb 2020Building reproducible machine learning in production (Ep. 96)00:14:20

Building reproducible models is essential for all those scenarios in which the lead developer is collaborating with other team members. Reproducibility in machine learning shall not be an art, rather it should be achieved via a methodical approach.  In this episode I give a few suggestions about how to make your ML models reproducible and keep your workflow as smooth.

Enjoy the show! Come visit us on our discord channel and have a chat

04 Sep 2020Testing in machine learning: checking deeplearning models (Ep. 118)00:18:17

In this episode I speak with Adam Leon Smith, CTO at DragonFly and expert in testing strategies for software and machine learning. We cover testing with deep learning (neuron coverage, threshold coverage, sign change coverage, layer coverage, etc.), combinatorial testing and their practical aspects.

On September 15th there will be a live@Manning Rust conference. In one Rust-full day you will attend many talks about what's special about rust, building high performance web services or video game, about web assembly and much more. If you want to meet the tribe, tune in september 15th to the live@manning rust conference.

 

 

22 Jul 2024Low-Code Magic: Can It Transform Analytics? (Ep. 260)00:33:45

Join us as David Marom, Head of Panoply Business, explores the benefits of all-in-one data platforms. Learn how tech stack consolidation boosts efficiency, improves data accuracy, and cuts costs. David shares insights on overcoming common challenges, enhancing data governance, and success stories from organizations thriving with Panoply.

 

Sponsors
  • Arctic Wolf Learn what the new year holds for ransomware as a service, Active Directory, artificial intelligence and more when you download the 2024 Arctic Wolf Labs Predictions Report today at arcticwolf.com/datascience
  • Intrepid AI (https://intrepid.ai) is an AI assisted all-in-one platform for robotics teams. Build robotics applications in minutes, not months

 

References

  1. Connect and analyze ALL of your data https://panoply.io/
  2. https://blog.panoply.io/raw-data-to-dashboards-in-just-10-steps
  3. https://blog.panoply.io/understanding-etl-extract-transform-and-load-data-to-boost-your-business-intelligence
  4. Blog: The Transformative Power of an All-in-One Data Platform

  5. Whitepaper: Eradicating Platform Inefficiencies

20 Sep 2022Predicting Out Of Memory Kill events with Machine Learning (Ep. 203)00:19:33

Sometimes applications crash. Some other times applications crash because memory is exhausted. Such issues exist because of bugs in the code, or heavy memory usage for reasons that were not expected during design and implementation. Can we use machine learning to predict and eventually detect out of memory kills from the operating system?

Apparently, the Netflix app many of us use on a daily basis leverage ML and time series analysis to prevent OOM-kills.

Enjoy the show!

Our Sponsors

Explore the Complex World of Regulations. Compliance can be overwhelming. Multiple frameworks. Overlapping requirements. Let Arctic Wolf be your guide. Check it out at https://arcticwolf.com/datascience

 

Amethix works to create and maximize the impact of the world’s leading corporations and startups, so they can create a better future for everyone they serve. We provide solutions in AI/ML, Fintech, Healthcare/RWE, and Predictive maintenance.

 

Transcript

1 00:00:04,150 --> 00:00:09,034 And here we are again with the season four of the Data Science at Home podcast.

2 00:00:09,142 --> 00:00:19,170 This time we have something for you if you want to help us shape the data science leaders of the future, we have created the the Data Science at Home's Ambassador program.

3 00:00:19,340 --> 00:00:28,378 Ambassadors are volunteers who are passionate about data science and want to give back to our growing community of data science professionals and enthusiasts.

4 00:00:28,534 --> 00:00:37,558 You will be instrumental in helping us achieve our goal of raising awareness about the critical role of data science in cutting edge technologies.

5 00:00:37,714 --> 00:00:45,740 If you want to learn more about this program, visit the Ambassadors page on our website@datascienceathome.com.

6 00:00:46,430 --> 00:00:49,234 Welcome back to another episode of Data Science at Home podcast.

7 00:00:49,282 --> 00:00:55,426 I'm Francesco Podcasting from the Regular Office of Amethyx Technologies, based in Belgium.

8 00:00:55,618 --> 00:01:02,914 In this episode, I want to speak about a machine learning problem that has been formulated at Netflix.

9 00:01:03,022 --> 00:01:22,038 And for the record, Netflix is not sponsoring this episode, though I still believe that this problem is a very well known problem, a very common one across factors, which is how to predict out of memory kill in an application and formulate this problem as a machine learning problem.

10 00:01:22,184 --> 00:01:39,142 So this is something that, as I said, is very interesting, not just because of Netflix, but because it allows me to explain a few points that, as I said, are kind of invariance across sectors.

11 00:01:39,226 --> 00:01:56,218 Regardless of your application, is a video streaming application or any other communication type of application, or a fintech application, or energy, or whatever, this memory kill, out of memory kill still occurs.

12 00:01:56,314 --> 00:02:05,622 And what is an out of memory kill? Well, it's essentially the extreme event in which the machine doesn't have any more memory left.

13 00:02:05,756 --> 00:02:16,678 And so usually the operating system can start eventually swapping, which means using the SSD or the hard drive as a source of memory.

14 00:02:16,834 --> 00:02:19,100 But that, of course, will slow down a lot.

15 00:02:19,430 --> 00:02:45,210 And eventually when there is a bug or a memory leak, or if there are other applications running on the same machine, of course there is some kind of limiting factor that essentially kills the application, something that occurs from the operating system most of the time that kills the application in order to prevent the application from monopolizing the entire machine, the hardware of the machine.

16 00:02:45,710 --> 00:02:48,500 And so this is a very important problem.

17 00:02:49,070 --> 00:03:03,306 Also, it is important to have an episode about this because there are some strategies that I've used at Netflix that are pretty much in line with what I believe machine learning should be about.

18 00:03:03,368 --> 00:03:25,062 And usually people would go for the fancy solution there like this extremely accurate predictors or machine learning models, but you should have a massive number of parameters and that try to figure out whatever is happening on that machine that is running that application.

19 00:03:25,256 --> 00:03:29,466 While the solution at Netflix is pretty straightforward, it's pretty simple.

20 00:03:29,588 --> 00:03:33,654 And so one would say then why making an episode after this? Well.

21 00:03:33,692 --> 00:03:45,730 Because I think that we need more sobriety when it comes to machine learning and I believe we still need to spend a lot of time thinking about what data to collect.

22 00:03:45,910 --> 00:03:59,730 Reasoning about what is the problem at hand and what is the data that can actually tickle the particular machine learning model and then of course move to the actual prediction that is the actual model.

23 00:03:59,900 --> 00:04:15,910 That most of the time it doesn't need to be one of these super fancy things that you see on the news around chatbots or autonomous gaming agent or drivers and so on and so forth.

24 00:04:16,030 --> 00:04:28,518 So there are essentially two data sets that the people at Netflix focus on which are consistently different, dramatically different in fact.

25 00:04:28,604 --> 00:04:45,570 These are data about device characteristics and capabilities and of course data that are collected at Runtime and that give you a picture of what's going on in the memory of the device, right? So that's the so called runtime memory data and out of memory kills.

26 00:04:45,950 --> 00:05:03,562 So the first type of data is I would consider it very static because it considers for example, the device type ID, the version of the software development kit that application is running, cache capacities, buffer capacities and so on and so forth.

27 00:05:03,646 --> 00:05:11,190 So it's something that most of the time doesn't change across sessions and so that's why it's considered static.

28 00:05:12,050 --> 00:05:18,430 In contrast, the other type of data, the Runtime memory data, as the name says it's runtime.

29 00:05:18,490 --> 00:05:24,190 So it varies across the life of the session it's collected at Runtime.

30 00:05:24,250 --> 00:05:25,938 So it's very dynamic data.

31 00:05:26,084 --> 00:05:36,298 And example of these records are for example, profile, movie details, playback information, current memory usage, et cetera, et cetera.

32 00:05:36,334 --> 00:05:56,086 So this is the data that actually moves and moves in the sense that it changes depending on how the user is actually using the Netflix application, what movie or what profile description, what movie detail has been loaded for that particular movie and so on and so forth.

33 00:05:56,218 --> 00:06:15,094 So one thing that of course the first difficulty of the first challenge that the people at Netflix had to deal with was how would you combine these two things, very static and usually small tables versus very dynamic and usually large tables or views.

34 00:06:15,142 --> 00:06:36,702 Well, there is some sort of join on key that is performed by the people at Netflix in order to put together these different data resolutions, right, which is data of the same phenomenon but from different sources and with different carrying very different signals in there.

35 00:06:36,896 --> 00:06:48,620 So the device capabilities is captured usually by the static data and of course the other data, the Runtime memory and out of memory kill data.

36 00:06:48,950 --> 00:07:04,162 These are also, as I said, the data that will describe pretty accurately how is the user using that particular application on that particular hardware.

37 00:07:04,306 --> 00:07:17,566 Now of course, when it comes to data and deer, there is nothing new that people at Netflix have introduced dealing with missing data for example, or incorporating knowledge of devices.

38 00:07:17,698 --> 00:07:26,062 It's all stuff that it's part of the so called data cleaning and data collection strategy, right? Or data preparation.

39 00:07:26,146 --> 00:07:40,782 That is, whatever you're going to do in order to make that data or a combination of these data sources, let's say, compatible with the way your machine learning model will understand or will read that data.

40 00:07:40,916 --> 00:07:58,638 So if you think of a big data platform, the first step, the first challenge you have to deal, you have to deal with is how can I, first of all, collect the right amount of information, the right data, but also how to transform this data for my particular big data platform.

41 00:07:58,784 --> 00:08:12,798 And that's something that, again, nothing new, nothing fancy, just basics, what we have been used to, what we are used to seeing now for the last decade or more, that's exactly what they do.

42 00:08:12,944 --> 00:08:15,222 And now let me tell you something important.

43 00:08:15,416 --> 00:08:17,278 Cybercriminals are evolving.

44 00:08:17,374 --> 00:08:22,446 Their techniques and tactics are more advanced, intricate and dangerous than ever before.

45 00:08:22,628 --> 00:08:30,630 Industries and governments around the world are fighting back on dealing new regulations meant to better protect data against this rising threat.

46 00:08:30,950 --> 00:08:39,262 Today, the world of cybersecurity compliance is a complex one, and understanding the requirements your organization must adhere to can be a daunting task.

47 00:08:39,406 --> 00:08:42,178 But not when the pack has your best architect.

48 00:08:42,214 --> 00:08:53,840 Wolf, the leader in security operations, is on a mission to end cyber risk by giving organizations the protection, information and confidence they need to protect their people, technology and data.

49 00:08:54,170 --> 00:09:02,734 The new interactive compliance portal helps you discover the regulations in your region and industry and start the journey towards achieving and maintaining compliance.

50 00:09:02,902 --> 00:09:07,542 Visit Arcticwolves.com DataScience to take your first step.

51 00:09:07,676 --> 00:09:11,490 That's arcticwolf.com DataScience.

52 00:09:12,050 --> 00:09:18,378 I think that the most important part, though, I think are actually equally important.

53 00:09:18,464 --> 00:09:26,854 But the way they treat runtime memory data and out of memory kill data is by using sliding windows.

54 00:09:26,962 --> 00:09:38,718 So that's something that is really worth mentioning, because the way you would frame this problem is something is happening at some point in time and I have to kind of predict that event.

55 00:09:38,864 --> 00:09:49,326 That is usually an outlier in the sense that these events are quite rare, fortunately, because Netflix would not be as usable as we believe it is.

56 00:09:49,448 --> 00:10:04,110 So you would like to predict these weird events by looking at a historical view or an historical amount of records that you have before this particular event, which is the kill of the application.

57 00:10:04,220 --> 00:10:12,870 So the concept of the sliding window, the sliding window approach is something that comes as the most natural thing anyone would do.

58 00:10:13,040 --> 00:10:18,366 And that's exactly what the researchers and Netflix have done.

59 00:10:18,488 --> 00:10:25,494 So unexpectedly, in my opinion, they treated this problem as a time series, which is exactly what it is.

60 00:10:25,652 --> 00:10:26,190 Now.

61 00:10:26,300 --> 00:10:26,754 They.

62 00:10:26,852 --> 00:10:27,330 Of course.

63 00:10:27,380 --> 00:10:31,426 Use this sliding window with a different horizon.

64 00:10:31,558 --> 00:10:32,190 Five minutes.

65 00:10:32,240 --> 00:10:32,838 Four minutes.

66 00:10:32,924 --> 00:10:33,702 Two minutes.

67 00:10:33,836 --> 00:10:36,366 As close as possible to the event.

68 00:10:36,548 --> 00:10:38,886 Because maybe there are some.

69 00:10:39,008 --> 00:10:39,762 Let's say.

70 00:10:39,896 --> 00:10:45,678 Other dynamics that can raise when you are very close to the event or when you are very far from it.

71 00:10:45,704 --> 00:10:50,166 Like five minutes far from the out of memory kill.

72 00:10:50,348 --> 00:10:51,858 Might have some other.

73 00:10:51,944 --> 00:10:52,410 Let's say.

74 00:10:52,460 --> 00:10:55,986 Diagrams or shapes in the data.

75 00:10:56,168 --> 00:11:11,310 So for example, you might have a certain number of allocations that keep growing and growing, but eventually they grow with a certain curve or a certain rate that you can measure when you are five to ten minutes far from the out of memory kill.

76 00:11:11,420 --> 00:11:16,566 When you are two minutes far from the out of memory kill, probably this trend will change.

77 00:11:16,688 --> 00:11:30,800 And so probably what you would expect is that the memory is already half or more saturated and therefore, for example, the operating system starts swapping or other things are happening that you are going to measure in this.

78 00:11:31,550 --> 00:11:39,730 And that would give you a much better picture of what's going on in the, let's say, closest neighborhood of that event, the time window.

79 00:11:39,790 --> 00:11:51,042 The sliding window and time window approach is definitely worth mentioning because this is something that you can apply if you think pretty much anywhere right now.

80 00:11:51,116 --> 00:11:52,050 What they did.

81 00:11:52,160 --> 00:12:04,146 In addition to having a time window, a sliding window, they also assign different levels to memory readings that are closer to the out of memory kill.

82 00:12:04,208 --> 00:12:10,062 And usually these levels are higher and higher as we get closer and closer to the out of memory kill.

83 00:12:10,136 --> 00:12:15,402 So this means that, for example, we would have, for a five minute window, we would have a level one.

84 00:12:15,596 --> 00:12:22,230 Five minute means five minutes far from the out of memory kill, four minutes would be a level two.

85 00:12:22,280 --> 00:12:37,234 Three minutes it's much closer would be a level three, two minutes would be a level four, which means like kind of the severity of the event as we get closer and closer to the actual event when the application is actually killed.

86 00:12:37,342 --> 00:12:51,474 So by looking at this approach, nothing new there, even, I would say not even a seasoned data scientist would have understood that using a sliding window is the way to go.

87 00:12:51,632 --> 00:12:55,482 I'm not saying that Netflix engineers are not seasoned enough.

88 00:12:55,556 --> 00:13:04,350 Actually they do a great job every day to keep giving us video streaming platforms that actually never fail or almost never fail.

89 00:13:04,910 --> 00:13:07,460 So spot on there, guys, good job.

90 00:13:07,850 --> 00:13:27,738 But looking at this sliding window approach, the direct consequence of this is that they can plot, they can do some sort of graphical analysis of the out of memory kills versus the memory usage that can give the reader or the data scientist a very nice picture of what's going on there.

91 00:13:27,824 --> 00:13:39,330 And so you would have, for example, and I would definitely report some of the pictures, some of the diagrams and graphs in the show notes of this episode on the official website datascienceaton.com.

92 00:13:39,500 --> 00:13:48,238 But essentially what you can see there is that there might be premature peaks at, let's say, a lower memory reading.

93 00:13:48,334 --> 00:14:08,958 And usually these are some kind of false positives or anomalies that should not be there, then it's possible to set a threshold where the threshold to start lowering the memory usage because after that threshold something nasty can happen and usually happens according to your data.

94 00:14:09,104 --> 00:14:18,740 And then of course there is another graph about the Gaussian distribution or in fact no sharp peak at all.

95 00:14:19,250 --> 00:14:21,898 That is like kills or out of memory.

96 00:14:21,934 --> 00:14:33,754 Kills are more or less distributed in a normalized fashion and then of course there are the genuine peaks that indicate that kills near, let's say, the threshold.

97 00:14:33,802 --> 00:14:38,758 And so usually you would see that after that particular threshold of memory usage.

98 00:14:38,914 --> 00:14:42,142 You see most of the out of memory kills.

99 00:14:42,226 --> 00:14:45,570 Which makes sense because given a particular device.

100 00:14:45,890 --> 00:14:48,298 Which means certain amount of memories.

101 00:14:48,394 --> 00:14:50,338 Certain memory characteristics.

102 00:14:50,494 --> 00:14:53,074 Certain version of the SDK and so on and so forth.

103 00:14:53,182 --> 00:14:53,814 You can say.

104 00:14:53,852 --> 00:14:54,090 Okay.

105 00:14:54,140 --> 00:15:10,510 Well for this device type I have this memory memory usage threshold and after this I see that I have a relatively high number of out of memory kills immediately after this threshold.

106 00:15:10,570 --> 00:15:18,150 And this means that probably that is the threshold you would like to consider as the critical threshold you should never or almost never cross.

107 00:15:18,710 --> 00:15:38,758 So once you have this picture in front of you, you can start thinking of implementing some mechanisms that can monitor the memory usage and of course kind of preemptively dialocate things or keep that memory threshold as low as possible with respect to the critical threshold.

108 00:15:38,794 --> 00:15:53,446 So you can start implementing some logic that prevents the application from being killed by the operating system so that you would in fact reduce the rate of out of memory kills overall.

109 00:15:53,578 --> 00:16:11,410 Now, as always and as also the engineers state in their blog post, in the technical post, they say well, it's much more important for us to predict with a certain amount of false positive rather than false negatives.

110 00:16:11,590 --> 00:16:18,718 False negatives means missing an out of memory kill that actually occurred but got not predicted.

111 00:16:18,874 --> 00:16:40,462 If you are a regular listener of this podcast, that statement should resonate with you because this is exactly what happens, for example in healthcare applications, which means that doctors or algorithms that operate in healthcare would definitely prefer to have a bit more false positives rather than more false negatives.

112 00:16:40,486 --> 00:16:54,800 Because missing that someone is sick means that you are not providing a cure and you're just sending the patient home when he or she is sick, right? That's the false positive, it's the mess.

113 00:16:55,130 --> 00:16:57,618 So that's a false negative, it's the mess.

114 00:16:57,764 --> 00:17:09,486 But having a false positive, what can go wrong with having a false positive? Well, probably you will undergo another test to make sure that the first test is confirmed or not.

115 00:17:09,608 --> 00:17:16,018 So adding a false positive in this case is relatively okay with respect to having a false negative.

116 00:17:16,054 --> 00:17:19,398 And that's exactly what happens to the Netflix application.

117 00:17:19,484 --> 00:17:32,094 Now, I don't want to say that of course Netflix application is as critical as, for example, the application that predicts a cancer or an xray or something on an xray or disorder or disease of some sort.

118 00:17:32,252 --> 00:17:48,090 But what I'm saying is that there are some analogies when it comes to machine learning and artificial intelligence and especially data science, the old school data science, there are several things that kind of are, let's say, invariant across sectors.

119 00:17:48,410 --> 00:17:56,826 And so, you know, two worlds like the media streaming or video streaming and healthcare are of course very different from each other.

120 00:17:56,888 --> 00:18:05,274 But when it comes to machine learning and data science applications, well, there are a lot of analogies there.

121 00:18:05,372 --> 00:18:06,202 And indeed.

122 00:18:06,286 --> 00:18:10,234 In terms of the models that they use at Netflix to predict.

123 00:18:10,342 --> 00:18:24,322 Once they have the sliding window data and essentially they have the ground truth of where this out of memory kill happened and what happened before to the memory of the application or the machine.

124 00:18:24,466 --> 00:18:24,774 Well.

125 00:18:24,812 --> 00:18:30,514 Then the models they use to predict these things is these events is Artificial Neural Networks.

126 00:18:30,622 --> 00:18:31,714 Xg Boost.

127 00:18:31,822 --> 00:18:36,742 Ada Boost or Adaptive Boosting Elastic Net with Softmax and so on and so forth.

128 00:18:36,766 --> 00:18:39,226 So nothing fancy.

129 00:18:39,418 --> 00:18:45,046 As you can see, Xg Boost is probably one of the most used I would have expected even random forest.

130 00:18:45,178 --> 00:18:47,120 Probably they do, they've tried that.

131 00:18:47,810 --> 00:18:58,842 But XGBoost is probably one of the most used models on kaggle competitions for a reason, because it works and it leverages a lot.

132 00:18:58,916 --> 00:19:04,880 The data preparation step, that solves already more than half of the problem.

133 00:19:05,810 --> 00:19:07,270 Thank you so much for listening.

134 00:19:07,330 --> 00:19:11,910 I also invite you, as always, to join the Discord Channel.

135 00:19:12,020 --> 00:19:15,966 You will find a link on the official website datascience@home.com.

136 00:19:16,148 --> 00:19:17,600 Speak with you next time.

137 00:19:18,350 --> 00:19:21,382 You've been listening to Data Science at home podcast.

138 00:19:21,466 --> 00:19:26,050 Be sure to subscribe on itunes, Stitcher, or Pot Bean to get new, fresh episodes.

139 00:19:26,110 --> 00:19:31,066 For more, please follow us on Instagram, Twitter and Facebook or visit our website at datascienceathome.com

 

References

https://netflixtechblog.com/formulating-out-of-memory-kill-prediction-on-the-netflix-app-as-a-machine-learning-problem-989599029109

22 Jun 2020Rust and machine learning #3 with Alec Mocatta (Ep. 109)00:23:58

In the 3rd episode of Rust and machine learning I speak with Alec Mocatta. Alec is a +20 year experience professional programmer who has been spending time at the interception of distributed systems and data analytics. He's the founder of two startups in the distributed system space and author of Amadeus, an open-source framework that encourages you to write clean and reusable code that works, regardless of data scale, locally or distributed across a cluster.

Only for June 24th, LDN *Virtual* Talks June 2020 with Bippit (Alec speaking about Amadeus)

 

13 Nov 2024AI vs. The Planet: The Energy Crisis Behind the Chatbot Boom (Ep. 271)00:22:28

In this episode of Data Science at Home, we dive into the hidden costs of AI’s rapid growth — specifically, its massive energy consumption. With tools like ChatGPT reaching 200 million weekly active users, the environmental impact of AI is becoming impossible to ignore. Each query, every training session, and every breakthrough come with a price in kilowatt-hours, raising questions about AI’s sustainability.

 

Join us, as we uncovers the staggering figures behind AI's energy demands and explores practical solutions for the future. From efficiency-focused algorithms and specialized hardware to decentralized learning, this episode examines how we can balance AI’s advancements with our planet's limits. Discover what steps we can take to harness the power of AI responsibly!

 

Check our new YouTube channel at https://www.youtube.com/@DataScienceatHome

 

Chapters

00:00 - Intro

01:25 - Findings on Summary Statics

05:15 - Energy Required To Querry On GPT

07:20 - Energy Efficiency In BlockChain

10:41 - Efficicy Focused Algorithm

14:02 - Hardware Optimization

17:31 - Decentralized Learning

18:38 - Edge Computing with Local Inference

19:46 - Distributed Architectures

21:46 - Outro

 

 

#AIandEnergy #AIEnergyConsumption #SustainableAI #AIandEnvironment #DataScience #EfficientAI #DecentralizedLearning #GreenTech #EnergyEfficiency #MachineLearning #FutureOfAI #EcoFriendlyAI #FrancescoFrag #DataScienceAtHome #ResponsibleAI #EnvironmentalImpact

04 Aug 2023Building Self Serve Business Intelligence With AI and LLMs at Zenlytic (Ep. 235)00:47:37

In this episode, we dive into the world of data analytics and artificial intelligence with Ryan, the CEO, and Paul, the CTO of Zenlytic. Having graduated from Harvard and with extensive backgrounds in venture capital, consulting, and data engineering, Ryan and Paul provide valuable insights into their journey of building Zenlytic, a cutting-edge analytics platform. Join us as we explore how Zenlytic's natural language interface enhances user experiences, enabling seamless access and analysis of analytics data. Discover how their self-service platform empowers teams to leverage business intelligence effectively, and learn about the unique features that set Zenlytic apart from other analytics platforms in the market. Delve into the crucial aspects of data security and privacy while granting team access, and find out how Zenlytic's analytics capabilities have transformed companies into data-driven decision-makers, ultimately improving their performance.

References
24 Nov 2020Similarity in Machine Learning (Ep. 129)00:30:19

Come join me in our Discord channel speaking about all things data science.

Follow me on Twitch during my live coding sessions usually in Rust and Python

Subscribe to the official Newsletter and never miss an episode

Our Sponsors
  • ProtonMail offers a simple and trusted solution to protect your internet connection and access blocked or restricted websites. All of ProtonMail and ProtonVPN's apps are open source and have been inspected by cybersecurity experts, and Proton is based in Switzerland, home to some of the world’s strongest privacy laws
  • Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.
19 Mar 2022Bayesian Machine Learning with Ravin Kumar (Ep. 191)00:31:12

This is one episode where passion for math, statistics and computers are merged. I have a very interesting conversation with Ravin,  data scientist at Google where he uses data to inform decisions.

He has previously worked at Sweetgreen, designing systems that would benefit team members and communities through sustainable and healthy food, and SpaceX, creating tools that would ultimately launch rocket ships.

All opinions in this episode are his own and none of the companies he has worked for are represented.

 

This episode is brought to you by RailzAI

The Railz API connects to major accounting platforms to provide you with quick access to normalized and analyzed financial data. Get free access to their API and more. Just tell them you came through Data Science at Home podcast.

 

and by Amethix Technologies

Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

 

 

References

 

 

22 Oct 2023How Language Models Are the Ultimate Database(Ep. 241)00:26:26

In this episode, dive deep into the world of Language Models as we decode their intricate structure, revealing how these powerful algorithms exploit concepts from the past.

But... what if LLMs were just a database?

 

References

https://fchollet.substack.com/p/how-i-think-about-llm-prompt-engineering

 

13 Apr 2021Learning and training in AI times (Ep. 148)00:31:53

Is there a gap between life sciences and data science? What's the situation when it comes to interdisciplinary research? In this episode I am with Laura Harris, Director of Training for the Institute of Cyber-Enabled Research (ICER) at Michigan State University (MSU), and we try to answer some of those questions.

 

You can contact Laura at training@msu.edu or on LinkedIn

Enhance your understanding of Data Science at Home with My Podcast Data

At My Podcast Data, we strive to provide in-depth, data-driven insights into the world of podcasts. Whether you're an avid listener, a podcast creator, or a researcher, the detailed statistics and analyses we offer can help you better understand the performance and trends of Data Science at Home. From episode frequency and shared links to RSS feed health, our goal is to empower you with the knowledge you need to stay informed and make the most of your podcasting experience. Explore more shows and discover the data that drives the podcast industry.
© My Podcast Data