Beta
Logo of the podcast Data Archives - Software Engineering Daily

Data Archives - Software Engineering Daily (Data Archives - Software Engineering Daily)

Explorez tous les épisodes de Data Archives - Software Engineering Daily

Plongez dans la liste complète des épisodes de Data Archives - Software Engineering Daily. Chaque épisode est catalogué accompagné de descriptions détaillées, ce qui facilite la recherche et l'exploration de sujets spécifiques. Suivez tous les épisodes de votre podcast préféré et ne manquez aucun contenu pertinent.

Rows per page:

1–50 of 99

DateTitreDurée
11 May 2021Nextmv: Optimization in Fluid Work Environments with Carolyn Mooney00:53:26

The traveling salesman problem is a classic challenge of finding the shortest and most efficient route for a person to take given a list of destinations. This is one of many real-world optimization problems that companies encounter. How should they schedule product distribution, or promote product bundles, or define sales territories? The answers to these

The post Nextmv: Optimization in Fluid Work Environments with Carolyn Mooney appeared first on Software Engineering Daily.

12 May 2021Akita: Application Programming Interfaces with Jean Yang00:45:18

An Application Programming Interface, API for short, is the connector between 2 applications. For example, a user interface that needs user data will call an endpoint, like a special URL, with request parameters and receive the data back if the request is valid. Modern applications rely on APIs to send data back and forth to

The post Akita: Application Programming Interfaces with Jean Yang appeared first on Software Engineering Daily.

13 May 2021Apache Hudi: Large Scale Data Systems with Vinoth Chandar00:51:45

Apache Hudi is an open-source data management framework used to simplify incremental data processing and data pipeline development. This framework more efficiently manages business requirements like data lifecycle and improves data quality. Some common use cases for Hudi is record-level insert, update, and delete, simplified file management and near real-time data access, and simplified CDC

The post Apache Hudi: Large Scale Data Systems with Vinoth Chandar appeared first on Software Engineering Daily.

17 May 2021ClickHouse: Data Warehousing with Robert Hodges00:44:15

Columnar databases store and retrieve columns of data rather than rows of data. Each block of data in a columnar database stores up to 3 times as many records as row-based storage. This means you can read data with a third of the power needed in row-based data, among other advantages. The company Altinity is

The post ClickHouse: Data Warehousing with Robert Hodges appeared first on Software Engineering Daily.

20 May 2021Preset: Visualizing Big Data with Srini Kadamati00:53:46

Apache Superset is an open-source, fast, lightweight and modern data exploration and visualization platform. It can connect to any SQL based data source through SQLAlchemy at petabyte scale. Its architecture is highly scalable and it ships with a wide array of visualizations. The company Preset provides a powerful, easy to use data exploration and visualization

The post Preset: Visualizing Big Data with Srini Kadamati appeared first on Software Engineering Daily.

25 May 2021Firebolt: Data Warehouses with Eldad Farkash00:57:59

Cloud data warehouses are databases hosted in cloud environments. They provide typical benefits of the cloud like flexible data access, scalability, and performance.  The company Firebolt provides a cloud data warehouse built for modern data environments. It decouples storage and compute to operate on top of existing data lakes like S3. It computes orders of

The post Firebolt: Data Warehouses with Eldad Farkash appeared first on Software Engineering Daily.

27 May 2021Data Exploration with a New Python Library with Doris Lee00:47:58

Data exploration uses visual exploration to understand what is in a dataset and the characteristics of the data. Data scientists explore data to understand things like customer behavior and resource utilization. Some common programming languages used for data exploration are Python, R, and Matlab.  Doris Jung-Lin Lee is currently a Graduate Research Assistant at the

The post Data Exploration with a New Python Library with Doris Lee appeared first on Software Engineering Daily.

15 Jun 2021Stemma: Understanding Big Data with Mark Grover00:40:33

Amundsen was started at Lyft and is the leading open-source data catalog with the fastest-growing community and the most integrations. Amundsen enables you to search your entire organization by text search, see automated and curated metadata, share context with co workers, and learn from others by seeing most common queries on a table or frequently

The post Stemma: Understanding Big Data with Mark Grover appeared first on Software Engineering Daily.

16 Jun 2021Blissfully: Comprehensive IT Management with Aaron White00:55:52

Delivering Saas products involves a lot more than just building the product. Saas management involves customer relationship management, licensing, renewals, maintaining software visibility, and the general management of the technology portfolio.  The company Blissfully helps businesses manage their SaaS products from within a complete IT platform with organization, automation, and security built in. The Blissfully

The post Blissfully: Comprehensive IT Management with Aaron White appeared first on Software Engineering Daily.

17 Jun 2021StreamSets: DataOps and Smart Pipelines with Arvind Prabhakar00:55:39

The company StreamSets is enabling DataOps practices in today’s enterprises. StreamSets is a data engineering platform designed to help engineers design, deploy, and operate smart data pipelines. StreamSets Data Collector is a codeless solution for designing pipelines, triggering CDC operations, and monitoring data in flight. StreamSets Transformer uses Apache Spark to generate insights about your

The post StreamSets: DataOps and Smart Pipelines with Arvind Prabhakar appeared first on Software Engineering Daily.

23 Jun 2021Axiom Browser Automation with Yaseer Sheriff00:38:21

The quantity and quality of a company’s data can mean the difference between a major success or major failure. Companies like Google have used big data from its earliest days to steer their product suite in the direction consumers need. Other companies, like Apple, didn’t always use big data analytics to drive product design, but

The post Axiom Browser Automation with Yaseer Sheriff appeared first on Software Engineering Daily.

24 Jun 2021Uber Data Science with Kevin Novak00:50:17

Uber is one of many examples we’ve discussed on this show that has changed the world with big data analysis. With over 8 million users, 1 billion Uber trips and people driving for Uber in over 400 cities and 66 countries, Uber has redefined an entire industry in a very short time frame. It’s difficult

The post Uber Data Science with Kevin Novak appeared first on Software Engineering Daily.

02 Jul 2021LayerCI with Colin Chartier00:49:59

Continuous integration is a coding practice where engineers deliver incremental and frequent code changes to create higher quality software and collaborate more. Teams attempting to continuously integrate new code need a consistent and automated pipeline for reviewing, testing, and deploying the changes. Otherwise change requests pile up in the queue and nothing gets integrated efficiently. 

The post LayerCI with Colin Chartier appeared first on Software Engineering Daily.

01 Jul 2021Meltano: ELT for DataOps with Douwe Maan00:51:34

ELT is a process for copying data from a source system into a target system. It stands for “Extract, Load, Transform” and starts with extracting a copy of data from the source location. It’s loaded into the target system like a data warehouse, and then it’s ready to be transformed into a usable format for

The post Meltano: ELT for DataOps with Douwe Maan appeared first on Software Engineering Daily.

03 Jul 2021Text Blaze: Text Shortcuts with Scott Fortmann-Roe00:46:00

There are over 4 billion people using email. Many people using email for business communicate quick questions to colleagues, send repetitive, template-based information to potential customers and freshly hired employees, and repeat a lot of the same phrases. We actually repeat phrases in a lot of written formats. How often do you copy and paste

The post Text Blaze: Text Shortcuts with Scott Fortmann-Roe appeared first on Software Engineering Daily.

12 Jul 2021Data Lineage: Understanding Data Lineage at Scale with Julien Le Dem00:58:48

Big Data has exploded the past decade as cloud computing and more efficient hardware made scaling essentially limitless. Products like Uber revolve entirely around analyzing data to provide rides. According to an EMC/IDC study, there was approximately 5.2TB of data for every person in 2020. That estimate was made before the transition to remote work,

The post Data Lineage: Understanding Data Lineage at Scale with Julien Le Dem appeared first on Software Engineering Daily.

14 Jul 2021Data Science on AWS: Implementing AI and ML Pipelines on AWS with Chris Fregly00:47:03

Data science is an interdisciplinary field that combines strong technical skills with industry knowledge to perform a large range of jobs. Data scientists solve business questions with hands-on work cleaning and analyzing data, building machine learning models and applying algorithms, and generating dynamic visuals and tools to understand the world from the data it generates.

The post Data Science on AWS: Implementing AI and ML Pipelines on AWS with Chris Fregly appeared first on Software Engineering Daily.

02 Aug 2021Reverse ETL: Operationalizing Data Warehouses with Tejas Manohar00:53:44

Enterprise data warehouses store all company data in a single place to be accessed, queried, and analyzed. They’re essential for business operations because they support managing data from multiple sources, providing context, and have built-in analytics tools. While keeping a single source of truth is important, easily moving data from the warehouse to other applications

The post Reverse ETL: Operationalizing Data Warehouses with Tejas Manohar appeared first on Software Engineering Daily.

15 Jul 2021Better Stack: A New DevOps Experience with Juraj Masar00:52:21

DevOps has shortened the development life cycle for countless applications and is embraced by companies around the world. But managing and monitoring multiple environments is still a major pain point, particularly when companies need to mix cloud and legacy systems. Knowing when services go down and quickly pinpointing the cause is essential for continuous development. 

The post Better Stack: A New DevOps Experience with Juraj Masar appeared first on Software Engineering Daily.

19 Jul 2021Imply Infra: Big Data Analysis and Real-World Examples with Jad Naous00:44:43

Big data analytics is the process of collecting data, processing and cleaning it, then analyzing it with techniques like data mining, predictive analytics, and deep learning. This process requires a suite of tools to operate efficiently. Data analytics can save companies money, drive product development, and give insight into the market and customers. The company

The post Imply Infra: Big Data Analysis and Real-World Examples with Jad Naous appeared first on Software Engineering Daily.

21 Jul 2021CockroachDB: Distributed Databases and Containerization with Spencer Kimball00:52:21

In 2003, Google developed a robust cluster management system called Borg. This enabled them to manage clusters with tens of thousands of machines, moving them away from virtual machines and firmly into container management. Then, in 2014, they open sourced a version of Borg called Kubernetes, or K8s.  Now, in 2021, CockroachDB is a distributed

The post CockroachDB: Distributed Databases and Containerization with Spencer Kimball appeared first on Software Engineering Daily.

26 Jul 2021Pulsar Rerevisted with Enrico Olivelli00:56:17

In the previous episode, Pulsar Revisited, we discussed how the company DataStax has added to their product stack Astra Streaming, their cloud-native messaging and event streaming service that’s built on top of Apache Pulsar. We discussed Apache Pulsar and the added features DataStax offers like injecting machine learning into your data streams and viewing real-time

The post Pulsar Rerevisted with Enrico Olivelli appeared first on Software Engineering Daily.

28 Jul 2021Prophecy: Apple of Data Engineering with Raj Bains00:58:24

Prophecy is a complete Low-Code Data Engineering Platform for the Enterprise. Prophecy enables all your teams on Apache Spark with a unique low-code designer. While you visually build your Dataflows – Prophecy generates high-quality Spark code on Git. Then, you can schedule Spark workflows with Prophecy’s low-code Airflow. Not only that, Prophecy provides end-to-end visibility

The post Prophecy: Apple of Data Engineering with Raj Bains appeared first on Software Engineering Daily.

13 Aug 2021DaaS with Auren Hoffman01:47:58

Auren Hoffman is the CEO of SafeGraph. In this episode we discuss data as a service and more. This interview was also recorded as a video podcast. Check out the video on the Software Daily YouTube channel. Sponsorship inquiries: sponsor@softwareengineeringdaily.com

The post DaaS with Auren Hoffman appeared first on Software Engineering Daily.

16 Aug 2021Druid: Event-Driven Data with Eric Tschetter00:56:26

Whether sending messages, shopping in an app, or watching videos, modern consumers expect information and responsiveness to be near-instant in their apps and devices. From a developer’s perspective, this means clean code and a fast database.  Apache Druid is a database built to power real-time analytic workloads for event-driven data, like user-facing applications, streaming, and

The post Druid: Event-Driven Data with Eric Tschetter appeared first on Software Engineering Daily.

19 Aug 2021InfluxData: Time-Series Data with Russ Savage00:43:55

Time series data are simply measurements or events that are tracked, monitored, downsampled, and aggregated over time. This could be server metrics, application performance monitoring, network data, sensor data, events, clicks, trades in a market, and many other types of analytics data (influxdata.com). The platform InfluxData is designed for building and operating time series applications.

The post InfluxData: Time-Series Data with Russ Savage appeared first on Software Engineering Daily.

07 Sep 2021Instabase with Anant Bhardwaj00:48:09

Instabase is a technology platform for building automation solutions. Users deploy it onto their own infrastructure and can leverage the tools offered by the platform to build complex workflows for handling tasks like income verification and claims processing. In this episode we interview Anant Bhardwaj, founder of Instabase. He describes Instabase as an operating system. 

The post Instabase with Anant Bhardwaj appeared first on Software Engineering Daily.

14 Sep 2021Modern Data Stacks Optimized by Mozart Data with Peter Fishman and Dan Silberman00:50:57

Modern companies leverage dozens or even hundreds of software solutions to solve specific needs of the business.  Organizations need to collect all these disparate data sources into a data warehouse in order to add value.  The raw data typically needs transformation before it can be analyzed.  In many cases, companies develop homegrown solutions, thus reinventing

The post Modern Data Stacks Optimized by Mozart Data with Peter Fishman and Dan Silberman appeared first on Software Engineering Daily.

21 Sep 2021LinearB with Dan Lines00:45:40

A developer’s core deliverables are individual commits and the pull requests they aggregate into. While the number of lines of code written alone may not be very informative, in total, the code and metadata about the code found in tracking systems present a rich dataset with great promise for analysis and productivity optimization insights. LinearB

The post LinearB with Dan Lines appeared first on Software Engineering Daily.

29 Sep 2021Faking Data Using Tonic.ai with Ian Coe and Adam Kamor00:50:23

Companies that gather data about their users have an ethical obligation and legal responsibility to protect the personally identifiable information in their dataset.  Ideally, developers working on a software application wouldn’t need access to production data. Yet without high-quality example data, many technology groups stumble on avoidable problems.  Organizations need a solution to protect privacy

The post Faking Data Using Tonic.ai with Ian Coe and Adam Kamor appeared first on Software Engineering Daily.

24 Sep 2021No Code Process Automation at Axiom with Yaseer Sheriff00:43:49

Tedious, repetitive tasks are better handled by machines.  Unless these tasks truly require human intelligence, repetitive tasks are often good candidates for automation.  Implementing process automation can be challenging and technical.  Increasingly, engineers are seeking out tools and platforms to facilitate faster, more reliable automation. In this episode I talk to Yaseer Sheriff, Co-Founder and

The post No Code Process Automation at Axiom with Yaseer Sheriff appeared first on Software Engineering Daily.

28 Sep 2021DBT: Data Build Tool with Tristan Handy00:44:56

Applications write data to persistent storage like a database.  The most popular database query language is SQL which has many similar dialects.  SQL is expressive and powerful for describing what data you want.  What you do with that data requires a solution in the form of a data pipeline.  Ideally, these analytical workflows can follow

The post DBT: Data Build Tool with Tristan Handy appeared first on Software Engineering Daily.

01 Oct 2021Git Scales for Monorepos with Derrick Stolee00:53:58

In a version control system, a Monorepo is a version control management strategy in which all your code is contained in one potentially large but complete repository.  The monorepo is in stark contrast to an alternative approach in which software teams independently manage microservices or deliver software as libraries to be imported in other projects. 

The post Git Scales for Monorepos with Derrick Stolee appeared first on Software Engineering Daily.

05 Oct 2021Modern Data Infrastructure and Tools with Leigh Marie Braswell00:47:57

The first industrial deployments of machine learning and artificial intelligence solutions were bespoke by definition and often had brittle operating characteristics.  Almost no one builds custom databases, web servers, or email clients.  Yet technology groups today often consider developing homegrown ML and data solutions in order to solve their unique use cases.  Today’s modern data

The post Modern Data Infrastructure and Tools with Leigh Marie Braswell appeared first on Software Engineering Daily.

08 Oct 2021Infrastructure as Code with Christian Tragesser00:43:52

Infrastructure as Code is an approach to machine provisioning and setup in which a programmer describes the underlying services they need for their projects.  However, this infrastructure code doesn’t compile a binary artifact like traditional source code.  The successful completion of running the code signals that the servers and other components described in the configuration

The post Infrastructure as Code with Christian Tragesser appeared first on Software Engineering Daily.

28 Oct 2021Datadog with Omri Sass and Hugo Kaczmarek00:39:22

Modern business applications are complex.  It’s not enough to have raw logs or some basic telemetry.  Today’s enterprise organizations require an application performance monitoring solution or APM.  Today’s applications are complex distributed systems whose performance depends on a wide variety of factors.  Every single line of code can affect production and teams need insights into

The post Datadog with Omri Sass and Hugo Kaczmarek appeared first on Software Engineering Daily.

03 Nov 2021Location-Based Experiences Using Foursquare with Ankit Patel00:48:24

The manner in which users interact with technology has rapidly switched to mobile consumption.  The devices almost all of us carry with us at all times open endless opportunities for developers to create location-based experiences. Foursquare became a household name when the introduced social check-ins.  Today they’re a location data platform.  Ankit Patel is the

The post Location-Based Experiences Using Foursquare with Ankit Patel appeared first on Software Engineering Daily.

08 Nov 2021Observability Using Honeycomb.io with Christine Yen00:49:09

It does not matter if it runs on your machine.  Your code must run in the production environment and it must do so performantly.  For that, you need tooling to better understand your application’s behavior under different circumstances.  In the earliest days of software development, all we had were logs, which are still around and

The post Observability Using Honeycomb.io with Christine Yen appeared first on Software Engineering Daily.

10 Nov 2021Scalable Streaming Video with Amit Mishra00:35:45

The internet is a layer cake of technologies and protocols.  At a fundamental level, the internet runs on the TCP/IP protocol.  It’s a packet based system.  When your browser requests a file from a web server, that server chops up the file into tiny pieces known as packets and puts them on the network labeled

The post Scalable Streaming Video with Amit Mishra appeared first on Software Engineering Daily.

24 Nov 2021Metaplane with Kevin Hu00:44:13

Application observability is a fairly mature area.  Engineering teams have a wide selection of tools they can choose to adopt and a significant amount of thought leadership and philosophy already exists giving guidance for managing your application.  That application is going to persist data.  As you scale up, your system is invariably going to experience

The post Metaplane with Kevin Hu appeared first on Software Engineering Daily.

23 Nov 2021Risk and Compliance with Terry O’Daniel00:58:08

Consumers are increasingly becoming aware of how detrimental it can be when companies mismanage data.  This demand has fueled regulations, defined standards, and applied pressure to companies.  Modern enterprises need to consider corporate risk management and regulatory compliance. In this interview, I speak with Terry O’Daniel, Director of Engineering (Risk & Compliance) at Instacart. Sponsorship

The post Risk and Compliance with Terry O’Daniel appeared first on Software Engineering Daily.

09 Dec 2021Amplemarket with João Batalha00:38:37

The lifeblood of most companies is their sales departments.  When you’re selling something other than a commodity, it’s typically necessary to carefully groom the onboarding experience for inbound future customers.  Historically, companies approached this in a one-size-fits-all manner, giving all customers a common experience. In today’s data-driven age, a better experience can be provided that

The post Amplemarket with João Batalha appeared first on Software Engineering Daily.

10 Dec 2021MemGraph with Dominik Tomicevic00:42:37

Relational databases have been a fixture of software applications for decades.  They are highly tuned for performance and typically offer explicit guarantees like transactional consistency.  More recently, there’s been a figurative cambrian explosion of other-than-relational databases.  Simple key value stores or counters were an early win in this space. Managing a graph data structure is

The post MemGraph with Dominik Tomicevic appeared first on Software Engineering Daily.

21 Dec 2021Trifacta with Joe Hellerstein00:41:25

If you haven’t encountered a data quality problem, then you haven’t yet worked on a large enough project.  Invariably, a gap exists between the state of raw data and what an analyst or machine learning engineer needs to solve their problem.  Many organizations needing to automate data preparation workflows look to Trifacta as a solution. 

The post Trifacta with Joe Hellerstein appeared first on Software Engineering Daily.

28 Jan 2022Couchbase Architecture with Ravi Mayuram00:58:42

Couchbase is a distributed NoSQL cloud database. Since its creation, Couchbase has expanded into edge computing, application services, and most recently a database-as-a-service called Capella.  Couchbase started as an in-memory cache and needed to be rearchitected to be a persistent storage system. In this episode, I interview Ravi Mayuram, SVP Products and Engineering at Couchbase

The post Couchbase Architecture with Ravi Mayuram appeared first on Software Engineering Daily.

31 Jan 2022Scaling PlanetScale with Sugu Sougoumarane00:48:27

Database product companies typically have a few phases. First, the company will develop a technology with some kind of innovation such as speed, scalability, or durability. The company will offer support contracts around that technology for a period of time, before eventually building a managed, hosted offering. PlanetScale is a database company built around the

The post Scaling PlanetScale with Sugu Sougoumarane appeared first on Software Engineering Daily.

17 Feb 2022Data Quality Using Anomalo with Jeremy Stanley00:46:39

When writing code, test driven development is a common accepted methodology to ensure the development of high quality software.  Your organization’s data, on the other hand, is an entirely different challenge.  Data can be missing due to human error, a failure with a 3rd party provider, a botched release, or dozens of other issues.  When

The post Data Quality Using Anomalo with Jeremy Stanley appeared first on Software Engineering Daily.

18 Feb 2022Hex Collaborative Data Workspace with Barry McCardel and Caitlin Colgrove00:45:09

Barry McCardel Co-Founder and CEO at Hex Caitlin Colgrove Co-Founder and CTO at Hex In contrast to other IDEs, the notebook interface offers software developers a unique environment idealized for data professionals.  Despite the growth in popularity, a surprising learning curve still exists for setup and configuration.  A siloed notebook offers no native collaboration tools. 

The post Hex Collaborative Data Workspace with Barry McCardel and Caitlin Colgrove appeared first on Software Engineering Daily.

23 Feb 2022Splunk Platform with Spiros Xanthos00:43:30

Splunk is a monitoring and logging platform that has evolved over its 18 years of existence. In its modern focus on observability it is focused on open source and AIOps. Observability has evolved with the growth of Kubernetes, and Splunk’s work around OpenTelemetry has kept parity with the open source community of Kubernetes. Spiros Xanthos

The post Splunk Platform with Spiros Xanthos appeared first on Software Engineering Daily.

25 Feb 2022Data Catalog in Practice with Mark Grover00:51:38

A data catalog provides an index into the data sets and schemas of a company. Data teams are growing in size, and more companies than ever have a data team, so the market for data catalog is larger than ever. Mark is the CEO of Stemma and the co-creator of Amundsen, a data catalog that came out of

The post Data Catalog in Practice with Mark Grover appeared first on Software Engineering Daily.

09 Mar 2022Apache Hudi with Vinoth Chandar00:43:03

The data lake architecture has become broadly adopted in a relatively short period of time.  In a nutshell, that means data in it’s raw format stored in cloud object storage.  Modern software and data engineers have no shortage of options for accessing their data lake, but that list shrinks quickly if you care about features

The post Apache Hudi with Vinoth Chandar appeared first on Software Engineering Daily.

16 Mar 2022RudderStack Engineering with Soumaydeb Mitra00:46:54

Customer data pipelines power the backend of many successful web platforms. In a customer data pipeline, data is collected from sources such as mobile apps and cloud SaaS tools, transformed and munged using data engineering, stored in data warehouses, and piped to analytics, advertising platforms, and data infrastructure. RudderStack is an open source customer data

The post RudderStack Engineering with Soumaydeb Mitra appeared first on Software Engineering Daily.

19 Mar 2022DuckDB with Hannes Muleisen00:49:02

DuckDB is a relational database management system with no external dependencies, with a simple system for deployment and integration into build processes. It enables complex queries in SQL with a large function library, and provides transactional guarantees through multi-version concurrency control. Hannes Mühleisen works on DuckDB and joins the show to talk about query engines

The post DuckDB with Hannes Muleisen appeared first on Software Engineering Daily.

29 Mar 2022SingleStore with Jordan Tigani00:42:57

SingleStore is a multi-use, multi-model database designed for transactional and analytic workloads, as well as search and other domain specific applications. SingleStore is the evolution of the database company MemSQL, which sought to bring fast, in-memory SQL database technology to market. Jordan Tigani is Chief Product Officer of SingleStore and joins the show to talk

The post SingleStore with Jordan Tigani appeared first on Software Engineering Daily.

31 Mar 2022PlanetScale Management with Sam Lambert00:49:11

Running a database company requires expertise in both technical and managerial skills. There are deeply technical engineering questions around query paths, scalability, and distributed systems. And there are complex managerial questions around developer productivity and task allocation. Sam Lambert is the CEO of PlanetScale, which is building modern relational database infrastructure. Before PlanetScale he spent

The post PlanetScale Management with Sam Lambert appeared first on Software Engineering Daily.

05 Apr 2022Data Engineering Trends with Lior Gavish and James Densmore00:43:56

 Lior Gavish James Densmore Data infrastructure is a fast-moving sector of the software market. As the volume of data has increased, so too has the quality of tooling to support data management and data engineering. In today’s show, we have a guest from a data intensive company as well as a company that builds a

The post Data Engineering Trends with Lior Gavish and James Densmore appeared first on Software Engineering Daily.

14 Apr 2022Time Series IoT on InfluxDB with Brian Gilmore00:48:30

The solution many turn to for capturing their streaming data is InfluxDB.  In this episode, I interview Brian Gilmore, Director of Product Management at InfluxData, about how real time applications achieve success built on top of InfluxDB. When most people hear the phrase Internet of Things, it typically evokes an image of connected devices we

The post Time Series IoT on InfluxDB with Brian Gilmore appeared first on Software Engineering Daily.

25 Apr 2022Select Star with Shinji Kim00:42:47

Modern organizations eventually face data governance challenges.  Keeping track of where data came from, what systems update it, in what ways updates can be made are just some of the issues to be tackled.  Large organizations face additional challenges like training, onboarding, and capturing the institutional knowledge that leaves with the departure of key team

The post Select Star with Shinji Kim appeared first on Software Engineering Daily.

27 Apr 2022Airbyte Engineering with Michel Tricot00:42:26

Data integration infrastructure is not easy to build. Moving large amounts of data from one place to another has historically required developers to build ad hoc integration points to move data between SaaS services, data lakes, and data warehouses. Today, there are dedicated systems and services for moving these large batches of data. Airbyte builds

The post Airbyte Engineering with Michel Tricot appeared first on Software Engineering Daily.

29 Apr 2022Data Loss Prevention with Yasir Ali00:40:55

Data loss can occur when large data sources such as Slack or Google Drive get leaked. In order to detect and avoid leaks, a data asset graph can be built to understand the risks of a company environment. Polymer is a data loss prevention product that helps companies avoid problematic data leaks. Yasir Ali is

The post Data Loss Prevention with Yasir Ali appeared first on Software Engineering Daily.

11 May 2022Data Labeling with Michael Malyuk00:41:51

Data labeling allows machine learning algorithms to find patterns among the data. There are a variety of data labeling platforms that enable humans to apply labels to this data and ready it for algorithms. Heartex is a data labeling platform with an open source core. Michael Malyuk joins the show to talk through the platform

The post Data Labeling with Michael Malyuk appeared first on Software Engineering Daily.

09 May 2022Pinot and StarTree with Chinmay Soman00:44:17

Real-time analytics are difficult to achieve because large amounts of data must be integrated into a data set as that data streams in. As the world moved from batch analytics powered by Hadoop into a norm of “real-time” analytics, a variety of open source systems emerged. One of these was Apache Pinot. StarTree is a

The post Pinot and StarTree with Chinmay Soman appeared first on Software Engineering Daily.

14 May 2022Data Delivery with Naqeeb Memon00:28:14

  Data-as-a-service is a company category type that is not as common as API-as-a-service, software-as-a-service, or platform-as-a-service. In order to vend data, a data-as-a-service provider needs to define how that data will be priced, stored, and delivered to users: streaming over an API or served via static files. Naqeeb Memon of Safegraph joins the show

The post Data Delivery with Naqeeb Memon appeared first on Software Engineering Daily.

01 Jun 2022Decodable Streaming with Eric Sammer00:44:58

Streaming data platforms like Kafka, Pulsar, and Kinesis are now common in mainstream enterprise architectures, providing low-latency real-time messaging for analytics and applications. However, stream processing – the act of filtering, transforming, or analyzing the data inside the messages – is still an exercise left to the receiving microservice or datastore, a custom programming exercise

The post Decodable Streaming with Eric Sammer appeared first on Software Engineering Daily.

28 Jul 2022Couchbase with Ravi Mayuram00:30:08

Couchbase is a distributed NoSQL cloud database. Since its creation, Couchbase has expanded into edge computing, application services, and most recently, a database-as-a-service called Capella.  Couchbase started as an in-memory cache and needed to be rearchitected to be a persistent storage system. In this episode, We interviewed Ravi Mayuram, SVP Products, and Engineering at Couchbase.

The post Couchbase with Ravi Mayuram appeared first on Software Engineering Daily.

18 Aug 2022Data Infrastructure for Finance00:54:26

Data is becoming a bank’s biggest asset. These complex enterprises have a huge opportunity ahead – to transform themselves to become a trusted hub of a much broader data ecosystem that goes beyond the financial industry and helps to form a new class of cross-industry experience architectures that are scalable and transparent. The data physics

The post Data Infrastructure for Finance appeared first on Software Engineering Daily.

05 Aug 2022Faking Data Using Tonic.ai with Ian Coe and Adam Kamor00:46:49

Ian Coe CEO Adam Kamor Head of Engineering Companies that gather data about their users have an ethical obligation and legal responsibility to protect the personally identifiable information in their dataset.  Ideally, developers working on a software application wouldn’t need access to production data. Yet without high-quality example data, many technology groups stumble on avoidable

The post Faking Data Using Tonic.ai with Ian Coe and Adam Kamor appeared first on Software Engineering Daily.

12 Sep 2022Serverless Clickhouse for Developers with Jorge Sancha00:35:14

Data analytics technology and tools have seen significant improvements in the past decade. But, it can still take weeks to prototype, build and deploy new transformations and deployments, usually requiring considerable engineering resources. Plus, most data isn’t real-time. Instead, most of it is still batch-processed. Tinybird Analytics provides an easy way to ingest and query

The post Serverless Clickhouse for Developers with Jorge Sancha appeared first on Software Engineering Daily.

07 Nov 2022Building on the Data Cloud with Torsten Grabs00:40:03

Building and managing data-intensive applications has traditionally been costly and complex, and has placed an operational burden on developers to maintain as their organization scales. Todays’ developers, data scientists, and data engineers need a streamlined, single cloud data platform for building applications, pipelines, and machine learning models — without having to move or copy their

The post Building on the Data Cloud with Torsten Grabs appeared first on Software Engineering Daily.

11 Nov 2022Accessing Data at Scale with Justin Borgman00:46:18

The Presto/Trino project makes distributed querying easier across a variety of data sources. As the need for machine learning and other high volume data applications has increased, the need for support, tooling, and cloud infrastructure for Presto/Trino has increased with it. Starburst helps your teams run fast queries on any data source. With Starburst you

The post Accessing Data at Scale with Justin Borgman appeared first on Software Engineering Daily.

10 Mar 2023Data Investing and the MAD with Matt Turck00:51:08

There are many types of early stage funding available from friends and family to seed to series A.  Some firms invest across a wide set of technologies and seek only to provide capital. Others are in it for the long haul – they focus on specific areas of technology and develop both long term relationships

The post Data Investing and the MAD with Matt Turck appeared first on Software Engineering Daily.

06 Apr 2023Streaming Analytics with Hojjat Jafarpour00:46:48

Streaming analytics refers to the process of analyzing real-time data that is generated continuously and rapidly from various sources, such as sensors, applications, social media, and other internet-connected devices. Streaming analytics platforms enable organizations to extract business value from data in motion, similar to how traditional analytics tools derive insights from data at rest. DeltaStream

The post Streaming Analytics with Hojjat Jafarpour appeared first on Software Engineering Daily.

20 Mar 2023Observability Trends with John Hart00:26:58

DataSet is a log analytics platform provided by Sentinel One that helps DevOps, IT engineering, and security teams get answers from their data across all time periods, both live streaming and historical. It’s powered by a unique architecture that uses a massively parallel query engine to provide actionable insights from the data available. John Hart

The post Observability Trends with John Hart appeared first on Software Engineering Daily.

03 Apr 2023Turso: Globally Replicated SQLite with Glauber Costa00:51:00

Distributed databases are necessary for storing and managing data across multiple nodes in a network. They provide scalability, fault tolerance, improved performance, and cost savings. By distributing data across nodes, they allow for efficient processing of large amounts of data and redundancy against failures. They can also be used to store data across multiple locations

The post Turso: Globally Replicated SQLite with Glauber Costa appeared first on Software Engineering Daily.

07 Apr 2023Self-Service Data Culture with Stemma’s Mark Grover00:46:26

A data catalog provides an index into the data sets and schemas of a company.Data teams are growing in size, and more companies than ever have a data team, so the market for data catalog is larger than ever. Mark is the CEO of Stemma and the co-creator of Amundsen, a data catalog that came

The post Self-Service Data Culture with Stemma’s Mark Grover appeared first on Software Engineering Daily.

13 Apr 2023Data Activation with Tejas Manohar00:41:10

Data Activation is the method of unlocking the knowledge sorted within your data warehouse, and making it actionable by your business users in the end tools that they use every day. In doing so, Data Activation helps bring data people toward the center of the business, directly tying their work to business outcomes. Hightouch is

The post Data Activation with Tejas Manohar appeared first on Software Engineering Daily.

20 Apr 2023Open-Source Embedding Database with Anton Troynikov00:32:37

Chroma is an open source embedding database that is designed to make it easy to build large language model applications by making knowledge, facts and skills pluggable. Anton Troynikov is the co-founder of Chroma and he is our guest today. This episode is hosted by Lee Atchison. Lee Atchison is a software architect, author, and

The post Open-Source Embedding Database with Anton Troynikov appeared first on Software Engineering Daily.

26 May 2023Low-Code SQL on dbt Core with Raj Bains from Prophecy00:54:15

In this podcast episode, we take a look at the intricacies of low-code data pipelines with Raj Bains, the founder of Prophecy.io. Raj shares valuable insights into how performant low-codedata pipelines are revolutionizing industries and transforming everyday operations. Raj discusses the founding story of Prophecy.io, the company’s mission, and its approach to democratizing the creation

The post Low-Code SQL on dbt Core with Raj Bains from Prophecy appeared first on Software Engineering Daily.

12 Jun 2023Data Reliability with Barr Moses and Lior Gavish00:56:22

As companies depend more on data to improve digital products and make informed decisions, it’s crucial that the data they use be accurate and reliable. MonteCarlo, the data reliability company, is the creator of the industry’s first end-to-end data observability platform. Barr Moses and Lior Gavish are the founders of Monte Carlo and they join

The post Data Reliability with Barr Moses and Lior Gavish appeared first on Software Engineering Daily.

30 Jun 2023Customer-facing Analytics with Tyler Wells00:51:38

The state of Data inside most companies is chaotic. It takes significant time and investment to tame this chaos. When you are a platform provider you are gathering tons of data from the developers using your platform. These developers building products on your platform need insight into that data to better understand how their application

The post Customer-facing Analytics with Tyler Wells appeared first on Software Engineering Daily.

11 Jul 2023Making Data-Driven Decisions with Soumyadeb Mitra00:50:45

RudderStack is a warehouse-native customer data platform (CDP) that helps businesses collect, unify, and activate customer data from all their different sources. In today’s episode, we’re talking to Soumyadeb Mitra, the founder and CEO of RudderStack. We discuss the importance of activating all your data, how RudderStack can help you activate your data, the challenges

The post Making Data-Driven Decisions with Soumyadeb Mitra appeared first on Software Engineering Daily.

20 Jul 2023Data-Centric AI with Alex Ratner00:50:19

Companies have high hopes for Machine learning and AI to support real-time product offerings, prevent fraud and drive innovation. But there was a catch – training models require labeled data that machines can digest. As data volumes increase, the opportunity to get great ML results rises, but so does the problem of labeling all the

The post Data-Centric AI with Alex Ratner appeared first on Software Engineering Daily.

08 Aug 2023Database Caching with Ben Hagan00:35:36

Database caching is a fundamental challenge in database management and there are hundreds of techniques to satisfy different caching scenarios. PolyScale is a fully automated database cache. It offers an innovative approach to database caching, leveraging AI and automated configuration to simplify the process of determining what should and should not be cached. Ben Hagan

The post Database Caching with Ben Hagan appeared first on Software Engineering Daily.

07 Sep 2023Highly Scalable NoSQL with Dor Laor00:36:27

ScyllaDB is a fast and highly scalable NoSQL database designed to provide predictable performance at a massive cloud scale. It can handle millions of operations per second at a scale of gigabytes or petabytes. It’s also designed to be compatible with Cassandra and DynamoDB APIs. Scylla is used by Zillow, Comcast, and for Discord’s 350M+

The post Highly Scalable NoSQL with Dor Laor appeared first on Software Engineering Daily.

05 Oct 2023AI and Business Analytics with John Adams00:30:03

It’s now clear that the adoption of AI will continue to increase, with nearly every industry working to rapidly incorporate it into their systems and applications to provide greater value to their users. Business analytics is a key domain that promises to be radically reshaped by AI. Alembic is an AI platform that integrates web

The post AI and Business Analytics with John Adams appeared first on Software Engineering Daily.

12 Oct 2023Observability with Eduardo Silva00:44:47

There are hundreds of observability companies out there, and many ways to think about observability, such as application performance monitoring, server monitoring, and tracing. In a production application, multiple tools are often needed to get proper visibility on the application. This creates some challenges. Applications can produce lots of different observatory observability data, but how

The post Observability with Eduardo Silva appeared first on Software Engineering Daily.

18 Oct 2023Modern Web Scraping with Erez Naveh00:57:29

Today it’s estimated there are over 1 billion websites on the internet. Much of this content is optimized to be viewed by human eyes, not consumed by machines. However, creating systems to automatically parse and structure the web greatly extends its utility, and paves the way for innovative solutions and applications. The industry of web

The post Modern Web Scraping with Erez Naveh appeared first on Software Engineering Daily.

24 Oct 2023Streamlit with Amanda Kelly00:47:06

The importance of data teams is undeniable. Most companies today use data to drive decision-making on anything from software feature development to product strategy, hiring and marketing. In some companies data is the product, which can make data teams even more vital. But there’s a common problem – analyzing data is hard and time consuming.

The post Streamlit with Amanda Kelly appeared first on Software Engineering Daily.

09 Nov 2023Chronosphere with Martin Mao00:48:08

Observability software helps teams to actively monitor and debug their systems, and these tools are increasingly vital in DevOps. However, it’s not uncommon for the volume of observability data to exceed the amount of actual business data. This creates two challenges – how to analyze the large stream of observability data, and how to keep

The post Chronosphere with Martin Mao appeared first on Software Engineering Daily.

23 Nov 2023Daytona with Ivan Burazin00:47:50

Cloud-based software development platforms such as GitHub Codespaces continue to grow in popularity. These platforms are attractive to enterprise organizations because they can be managed centrally with security controls. However, many, if not most, developers prefer a local IDE. Daytona is aiming to bridge that gap. It’s a layer between a local IDE and a

The post Daytona with Ivan Burazin appeared first on Software Engineering Daily.

29 Nov 2023The Right to Be Forgotten with Gal Ringel00:47:45

Data breaches at major companies are so now common that they hardly make the news. The Wikipedia page on data breaches lists over 350 between 2004 and 2023. The Equifax breach in 2017 was especially notable because over 160 million records were leaked, and much of the data was acquired by Equifax without individuals’ knowledge

The post The Right to Be Forgotten with Gal Ringel appeared first on Software Engineering Daily.

28 Nov 2023Sofascore with Josip Stuhli00:49:48

If you’re a sports fan and like to track sports statistics and results, you’ve probably heard of Sofascore. The website started in 2010 and ran on a modest single server. It now has 25 million monthly active users, covers 20 different sports, 11,000 leagues and tournaments, and is available in over 30 languages.   Josip

The post Sofascore with Josip Stuhli appeared first on Software Engineering Daily.

22 Nov 2023GraphAware with Luanne Misquitta00:57:51

Knowledge graphs are an intuitive way to define relationships between objects, events, situations, and concepts. Their ability to encode this information makes them an attractive database paradigm. Hume is a graph-based analysis solution developed by GraphAware. It represents data as a network of interconnected entities and provides analysis capabilities to extract insights from the data.

The post GraphAware with Luanne Misquitta appeared first on Software Engineering Daily.

07 Dec 2023Tracking Drug Smugglers and Migrating Databases with Benny Keinan and Lior Resisi00:50:40

Maritime logistics is the process organizing the movement of goods across the ocean. Historically, this has been a challenging problem because of the multinational nature of shipping, as well as piracy, smuggling, and legacy technology. It’s also profoundly important for security reasons, and because 90% of what we buy travels over the oceans. Ocean vessels

The post Tracking Drug Smugglers and Migrating Databases with Benny Keinan and Lior Resisi appeared first on Software Engineering Daily.

28 Dec 2023Rama with Nathan Marz00:45:06

Building scalable software applications can be complex and typically requires dozens of different tools. The engineering often involves handling many arcane tasks that are distant from actual application logic. In addition, a lack of a cohesive model for building applications can lead to substantial engineering costs. Nathan Marz is the creator of Rama, which is

The post Rama with Nathan Marz appeared first on Software Engineering Daily.

25 Dec 2023Bonus Episode: SurrealDB with Tobie Morgan Hitchcock00:57:17

SurrealDB is the result of a long-time collaboration between brothers Tobie and Jaime Morgan Hitchcock. The project has modest origins and started merely to support other projects the brothers were working on. However, over time the project grew and in 2021  they started working on it full-time. Since then the project has gained serious adoption.

The post Bonus Episode: SurrealDB with Tobie Morgan Hitchcock appeared first on Software Engineering Daily.

06 Feb 2024Building a Data Lake with Adam Ferrari00:46:19

Starburst is a data lake analytics platform. It’s designed to help users work with structured data at scale, and is built on the open source platform, Trino. Adam Ferrari is the SVP of Engineering at Starburst. He joins the show to talk about Starburst, data engineering, and what it takes to build a data lake.

The post Building a Data Lake with Adam Ferrari appeared first on Software Engineering Daily.

07 Mar 2024Iceberg at Netflix and Beyond with Ryan Blue00:47:37

Apache Iceberg is an open source high-performance format for huge data tables. Iceberg enables the use of SQL tables for big data, while making it possible for engines like Spark and Hive to safely work with the same tables, at the same time. Iceberg was started at Netflix by Ryan Blue and Dan Weeks, and

The post Iceberg at Netflix and Beyond with Ryan Blue appeared first on Software Engineering Daily.

04 Jul 2024Hyperscaling SQL with Sam Lambert

Databases underpin almost every user experience on the web, but scaling a database is one of the most fundamental infrastructure challenges in software development. PlanetScale offers a MySQL platform that is managed and highly scaleable. Sam Lambert is the CEO of PlanetScale and he joins the show to talk about why he started the platform,

The post Hyperscaling SQL with Sam Lambert appeared first on Software Engineering Daily.

Améliorez votre compréhension de Data Archives - Software Engineering Daily avec My Podcast Data

Chez My Podcast Data, nous nous efforçons de fournir des analyses approfondies et basées sur des données tangibles. Que vous soyez auditeur passionné, créateur de podcast ou un annonceur, les statistiques et analyses détaillées que nous proposons peuvent vous aider à mieux comprendre les performances et les tendances de Data Archives - Software Engineering Daily. De la fréquence des épisodes aux liens partagés en passant par la santé des flux RSS, notre objectif est de vous fournir les connaissances dont vous avez besoin pour vous tenir à jour. Explorez plus d'émissions et découvrez les données qui font avancer l'industrie du podcast.
© My Podcast Data