
Roaring Elephant (Dave Russell & Jhon Masschelein)
Explorez tous les épisodes de Roaring Elephant
Plongez dans la liste complète des épisodes de Roaring Elephant. Chaque épisode est catalogué accompagné de descriptions détaillées, ce qui facilite la recherche et l'exploration de sujets spécifiques. Suivez tous les épisodes de votre podcast préféré et ne manquez aucun contenu pertinent.
Date | Titre | Durée | |
---|---|---|---|
18 Nov 2015 | Episode 1 – A new beginning: Getting started in Hadoop | 00:36:06 | |
With all the buzz around big data generally, and Hadoop specifically, there's never been a better time for getting started in Hadoop. This episode covers how your two hosts got involved in Hadoop, and also discusses some of the other popular paths into the world of BigData/Hadoop
00:00 Recent events
How did your hosts get into Hadoop
04:30 main Topic
Driven by individuals vs organisations
Online education options
Formal training
19:20 With Questions from our Listeners:
Isn’t it really difficult?
Do you need to know Java?
Do you need to know SQL?
Will I need to throw everything else in my datacentre out?
Can I replace my EDW (Enterprise Data Warehouse)?
Do I have to re-write all my ETL (Extract-Transform-Load)?
36:05 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
01 Dec 2015 | Episode 2 – How to avoid disaster | 00:43:37 | |
When you are getting started with your journey with Hadoop, how to avoid Hadoop disaster? We have seen many people going through this journey and both of us have seen things people do that makes the project successful, and things people do that make projects more difficult than they should be.
00:00 Recent events
Customer pilot completion
SQL on Hadoop Masterclasses
Multi-tenant Spark notebook issues
Spark recommendation engine webinar
11:00 Main Topic
Starting too small
Baseline and benchmark
Config management
Backup and/or disaster recovery
Leaving security too late
36:00 Questions from our Listeners:
Where do I find data scientists?
Storage options?
Install everything?
43:37 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
15 Dec 2015 | Episode 3 – High level Hadoop architectures | 00:37:54 | |
What are the hardware and implementation options we see.A discussion ranging from direct attached storage versus network attached storage/storage area networks, to on-premise hardware versus cloud options.
00:00 Recent events
Organisations starting their Big Data Journey
A lessons learned workshop for a customer after their successful pilot
Planning Masterclasses for 2016
Migration customer workshop
Big Data and the Connected Car webinar (registration required)
07:30 Main Topic
Direct attached storage (DAS) or “traditional” hadoop
Network attached storage (NAS) / Storage Area Networks (SAN)
Cloud / Azure / AWS / Google Cloud / Openstack etc...
SaaS/PaaS/HaaS/HDInsight
Ceph & Gluster
ObjectStore(S3) and Other cloud storages
25:30 Questions from our Listeners:
Doesn’t having a SAN/NAS system break data locality?
Can I mix drive sizes and types within a cluster or even within the same node?
Hybrid cluster environments, how to mix cloud and on premise deployment?
Can I dedicate certain nodes to certain workloads?
37:54 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
29 Dec 2015 | Episode 4 – Hadoop: Year in review | 00:38:35 | |
A bit of Hadoop history of what we have seen happening over the last 12 months, some trends and interesting technologies. Some ups, some downs and possibly even some round and rounds, capped off with some Bold Predictions for 2016.
00:00 Recent events
A number of engagements
Apache Nifi
Why some Hadoop users decide to go for separate clusters per use case or (internal) client
06:00 Main Topic
A broad acceptance of Hadoop in Europe
A shift from batch workload to multi-tenant, secure platform including IoT and Real time, in memory analytic.
Apache Ambari making our life easier all the time
Data Governance Initiative
Open Data Initiative (http://odpi.org)
Public clouds offer Big Data specific environment
Tech advances in Hive (CBO/ORC/Zlib) and Transparent Encryption in HDFS
Apache NiFi
The year of Apache "open community" open source
Bold Predictions!
31:00 Questions from our Listeners:
What new (incubating) projects should I invest time in today, knowing that they may never be included in any distribution?
I’ve been looking into Apache NiFi and am curious whether or not I can use it to replace Apache Flume?
Should I go for a Hadoop appliance or not?
38:35 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
30 Aug 2016 | Episode 23 – Security in Hadoop – Authentication | 01:07:49 | |
In this episode, we discuss this fortnight's interesting big data news that caught our eye and then go on to discuss the basics around authentication in Hadoop for what is the first in a series of episodes that we'll be doing over the next few months on the broad topic of security.
00:00 Recent events
Dave:
The new science behind customer loyalty
http://insights.principa.co.za/the-new-science-behind-customer-loyalty
http://insights.principa.co.za/infographic-creating-a-data-driven-customer-loyalty-strategy
5 great charts in 5 lines of R code
http://blog.revolutionanalytics.com/2016/08/five-great-charts-in-5-lines-of-r-code-each.html
Using big data to create value for customers, not just target them
https://hbr.org/2016/08/use-big-data-to-create-value-for-customers-not-just-target-them
Jhon:
Linux turns 25 (25 August 1991 )
https://www.linux.com/news/linus-torvalds-reflects-25-years-linux
http://web.archive.org/web/20100104211620/http://www.linux.org/people/linus_post.html
Hadoop 2.7.3 a minor release in the 2.x.y release line, building upon the previous stable release 2.7.2
http://hadoop.apache.org/docs/r2.7.3/
Specification work related to the Hadoop Compatible Filesystem (HCFS) effort. Hadoop in the cloud/as a service getting a lot of attention lately
http://hortonworks.com/blog/making-elephant-fly-cloud/
http://blog.cloudera.com/blog/2016/08/analytics-and-bi-on-amazon-s3-with-apache-impala-incubating/
https://vision.cloudera.com/analytic_database_in_cloud/
http://venturebeat.com/2016/08/25/sap-altiscale/
Facebook open sources image-recognition AI with live video in mind
https://research.facebook.com/blog/learning-to-segment/
NoSQL Databases: a Survey and Decision Guidance
https://medium.baqend.com/nosql-databases-a-survey-and-decision-guidance-ea7823a822d#.c037d5jbj
Committer criteria from Apache
https://hadoop.apache.org/committer_criteria.html
Maybe they should just have referred to our podcast! :)
Episode 11 - Interview with Community Award Winner Venkatesh Sellappa
40:20 Security in Hadoop - Authentication
What is Authentication?
Why is it important?
When should I do it?
Hadoop is insecure by default without strong Authentication
Kerberos
Active Directory, MIT Kerberos and FreeIPA
01:07:49 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
13 Sep 2016 | Episode 24 – Hadoop Summit Melbourne 2016 Preview | 01:07:33 | |
With Hadoop Summit Melbourne 2016 starting the day after we are recording this episode, we go over the published agenda and discuss the current state of the Big Data Technology ecosystem while we pick our favorite sessions. Wish we were there!
00:00 Recent events
Dave
Cloud Security Alliance release cloud and big data security guidelines
http://siliconangle.com/blog/2016/08/28/the-cloud-security-alliance-publishes-its-best-practices-for-big-data-security/
https://cloudsecurityalliance.org/download/big-data-security-and- privacy-handbook/
Common Big Data Backup and Recovery myths
http://www.networkworld.com/article/3113036/big-data-business-intelligence/debunking-the-most-common-big-data-backup-and-recovery-myths.html
Big Data, Google, and the end of free will
http://www.ft.com/cms/s/2/50bb4830-6a4c-11e6-ae5b-a7cc5dd5a28c.html
Jhon
SuperComputing now going to hadoop style systems
https://techcrunch.com/2016/05/24/crays-latest-supercomputer-runs-openstack-and-open-source-big-data-tools/
The Home for Data Science
https://www.kaggle.com/
36:10 Hadoop Summit Melbourne 2016 Preview
01:07:33 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
27 Sep 2016 | Episode 25 – The pro’s and con’s of crafting your own distribution | 01:34:59 | |
When we talk about Big Data and Hadoop in particular, we generally have one of the existing distributions from Cloudera, Hortonworks or other Big Data companies in mind. But sometimes, a pre-built distro just does not meet the needs. In this episode, we have a guest on the show that explains why they made the choice to forgo the available distributions in favour of building ones own.
http://lod-cloud.net/
00:00 Recent events
Dave:
Which tool should I use?
http://brohrer.github.io/which_tool_should_i_use.html
YaRrr! - The Pirate’s guide to R
Blog: http://nathanieldphillips.com/thepiratesguidetor/
YaRrr! - Download the book:
https://drive.google.com/file/d/0B4udF24Yxab0S1hnZlBBTmgzM3M/view
Video tutorials to go with the above:
https://www.youtube.com/playlist?list=PL9tt3I41HFS9gmeZFEuNrnu_7V_NFngfJ
Listener Question from Sampath from Baltimore:
When moving into a career in Big Data, is it better to pick a technology like Spark and try to build expertise on it versus having a broader knowledge on many tools. I registered for Edx courses and working towards getting Cloudera Certification. Please provide me any advice.
Jhon:
More accountability for big-data algorithms
http://www.nature.com/news/more-accountability-for-big-data-algorithms-1.20653
The "doomsday" version:
http://time.com/4471451/cathy-oneil-math-destruction/
6 Illusions Execs Have About Big Data
https://www.entrepreneur.com/article/281809
Michele:
Hadoop release 3.0.0-alpha1 available
http://hadoop.apache.org/releases.html#03+September%2C+2016%3A+Release+3.0.0-alpha1+available
Running Spark on Alluxio with S3
https://www.oreilly.com/learning/running-spark-on-alluxio-with-s3
47:00 The pro's and con's of crafting your own distribution
With our special guest Michele Lamarca (@nonfacciocip).
Many thanks to Michele for being on the podcast with us and sharing his experiences!
01:34:59 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
11 Oct 2016 | Episode 26 – Security 2: Authorisation and audit | 01:10:32 | |
In this episode, we continue our coverage on Hadoop security. Where episode 24 dealt with the subject of authentication, we now delve deeper in the why and how of authorization and audit, and cover the major players in the arena.
00:00 Recent events
Dave
Beyond Privacy and Security in a Connected World
http://www.svds.com/beyond-privacy-security-connected-world/
The broken promise of open-source Big Data software – and what might fix it
http://siliconangle.com/blog/2016/09/27/the-broken-promise-of-open-source-big-data-software-and-what-might-fix-it-2/
Meet Apache Spot, a new open source project for cybersecurity
http://www.csoonline.com/article/3124497/big-data/meet-apache-spot-a-new-open-source-project-for-cybersecurity.html
SMEs advised to capitalise on ‘big data’
http://www.farminglife.com/news/farming-news/smes-advised-to-capitalise-on-big-data-1-7606523
Jhon
What is hardcore data science—in practice?
https://www.oreilly.com/ideas/what-is-hardcore-data-science-in-practice
Hortonworks, IBM Collaborate to Offer Open Source Distribution on Power Systems
http://www.prnewswire.com/news-releases/hortonworks-ibm-collaborate-to-offer-open-source-distribution-on-power-systems-300330299.html
https://www-03.ibm.com/press/us/en/pressrelease/50553.wss
Inside 'The Next Rembrandt': How JWT Got a Computer to Paint Like the Old Master The project leaders explain their brilliant, troubling masterpiece
http://www.adweek.com/news/advertising-branding/inside-next-rembrandt-how-jwt-got-computer-paint-old-master-172257
https://www.nextrembrandt.com/
Strata+Hadoop World New York
http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/grid/public/2016-09-28
http://hortonworks.com/blog/
http://community.cloudera.com/t5/News/ct-p/Welcome
Cloudera Kudu 1.0.0 released
http://community.cloudera.com/t5/Community-News-Release/ANNOUNCE-Apache-Kudu-1-0-0-released/m-p/45332
Audience Questions from Sampath @ Baltimore:
http://www.infoignite.com/sentiment.html
Azure HDInsight 3.5:
https://azure.microsoft.com/en-gb/blog/new-security-performance-and-isv-solutions-build-on-azure-hdinsight-s-leadership-to-make-hadoop-enterprise-ready-for-the-cloud/
Azure Search:
https://azure.microsoft.com/en-us/services/search/
42:15 Security 2: Authorisation and audit
The principles of auth reflected by the underlying organisation of your data
Sync with AD/LDAP groups, don’t go user specific wherever possible.
Use whatever tools are in your platform:
Cloudera - Sentry
https://sentry.apache.org/
Hortonwork - Ranger
http://ranger.apache.org/
MapR - ???
https://www.mapr.com/hadoop-security-and-big-data-governance-mapr
01:10:32 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
25 Oct 2016 | Episode 27 – Security 3: Encryption at rest and in motion | 00:57:53 | |
Rounding out our series on security in Hadoop, we finish with Encryption at rest and in motion. We go over the different approaches, do's and don'ts and mention some higher level application in this space.
00:00 News for the week!
Dave:
Executives Still Relying on Gut, Not Gigabytes in Planning for Future
http://www.datadigestonline.com/2016/10/executives-still-relying-on-gut.html
Rewriting SAS Programs for Financial Data Manipulation in R
http://blog.revolutionanalytics.com/2016/09/rewriting-sas-in-r-for-finance.html
Chris Surdak - Why so many Big Data projects fail
http://surdak.com/innovation-vs-improvement/
Jhon:
Apache Spark 2.0 Performance Improvements Investigated With Flame Graphs (14-Sep-2016)
http://db-blog.web.cern.ch/blog/luca-canali/2016-09-spark-20-performance-improvements-investigated-flame-graphs
SQL on Hadoop benchmarks get serious (14-Oct-2016)
http://www.zdnet.com/article/sql-on-hadoop-benchmarks-get-serious/
WHERE IS APACHE HIVE GOING? TO IN-MEMORY COMPUTING. (06-Oct-2016)
http://hortonworks.com/blog/apache-hive-going-memory-computing/
APACHE HIVE VS APACHE IMPALA QUERY PERFORMANCE COMPARISON (11-Oct-2016)
http://hortonworks.com/blog/apache-hive-vs-apache-impala-query-performance-comparison/
Cloudera wants extra money from Intel to become a cloud provider?
http://venturebeat.com/2016/08/30/cloudera-cloud-intel/
Four interesting things about IBM, Hadoop and open source (2 years old)
http://www.ibmbigdatahub.com/infographic/four-interesting-things-about-ibm-hadoop-and-open-source
Recovering from a database disk failure in Big SQL (20-oct-2016)
https://developer.ibm.com/hadoop/2016/10/20/recovering-from-a-database-disk-failure-in-big-sql-worker-node-4-1fp2-and-4-2/
37:20 Security 3: Encryption at rest and in motion
Nice intro in the apache docs:
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html
RPC Encryption:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_Security_Guide/content/ch_wire-rpc.html
57:53 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
08 Nov 2016 | Episode 28 – Talking Datameer with Erik Stalpers | 00:59:39 | |
In this episode, Dave is stuck in a hotel basement in the middle of internet nowhere and Erik Stalpers from Datameer joins us to talk about the Datameer exploration and visualization tool.
00:00 Recent events
Dave
Machine learning vs AI
http://www.wired.co.uk/article/machine-learning-ai-explained
Machine Learning Data Cleansing
https://gcn.com/articles/2016/10/19/activeclean-big-data.aspx
https://activeclean.github.io/
Battle of the Data Science Venn Diagrams
http://www.kdnuggets.com/2016/10/battle-data-science-venn-diagrams.html
http://www.prooffreader.com/2016/09/battle-of-data-science-venn-diagrams.html (original doc 21 september 2016)
Jhon
How Vector Space Mathematics Helps Machines Spot Sarcasm
https://www.technologyreview.com/s/602639/how-vector-space-mathematics-helps-machines-spot-sarcasm/
Straight talk about big data
http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/straight-talk-about-big-data
25:10 Talking Datameer with Erik Stalpers
Erik Stalpers, Solution Engineer at Datameer
https://nl.linkedin.com/in/erikstalpers
https://www.datameer.com/
59:39End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
22 Nov 2016 | Episode 29 – 1 Year anniversary | 01:04:23 | |
One year of elephants roaring has come and gone so we reminisce a little bit about what happened over the last year. And since we could not have done this podcast nearly as good without them, we asked the special guests we have had on the podcast over the previous year to call in on the Skype call and talk about what they have been up to.
00:00 One year of pod-casting...
Dave and Jhon reminiscing about how the Podcast got started.
06:55 Fireside chats with guests over the year
07:56 Joe Witt, Senior Director of Engineering at Hortonworks,
22:40 Michele Lamarca, Team Lead Big Data at Bright Computing
43:00 John Mertic, Director of Program Management for ODPi
01:04:23 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
06 Dec 2016 | Episode 30 – Apache Software Foundation | 01:02:08 | |
So many of the tools and projects we talk about and use every day are prefaced by 6 letters, A P A C H E... What does it mean to be an Apache project? What does the Apache Software Foundation (ASF) do for software? Are there other options? Let us tell you about the ASF!
00:00 Recent events
Dave:
How we caught the circle line rogue train with data
https://blog.data.gov.sg/how-we-caught-the-circle-line-rogue-train-with-data-79405c86ab6a#.mhqs1mikx
Black Friday 2016: Mobile vs Desktop User Behaviour
http://appinstitute.com/black-friday-2016-mobile-vs-desktop-sales/
AI Machine Attempts to Understand Comic Books ... and Fails
https://www.technologyreview.com/s/602973/ai-machine-attempts-to-understand-comic-books-and-fails/
https://arxiv.org/abs/1611.05118
https://arxiv.org/pdf/1611.05118v1.pdf
Jhon:
Paypal From Big Data to Fast Data in Four Weeks or How Reactive Programming is Changing the World
Part 1 and Reactive programming manifesto
http://www.reactivemanifesto.org/
https://www.paypal-engineering.com/2016/11/08/from-big-data-to-fast-data-in-four-weeks-or-how-reactive-programming-is-changing-the-world-part-1/
Part 2: How that change was followed by adding a spark micro bath (streaming) to the workflow
https://www.paypal-engineering.com/2016/11/18/from-big-data-to-fast-data-in-four-weeks-or-how-reactive-programming-is-changing-the-world-part-2/
Paypal And they are not only using spark, here is one talking about how they use storm for another real-time workflow.
https://www.paypal-engineering.com/2016/11/15/carrier-payments-big-data-pipeline-using-apache-storm/
Managing Spark Partitions with Coalesce and Repartition
A short write up on how spark does partitioning internally and some ways of improving the partition scheme
https://medium.com/@mrpowers/managing-spark-partitions-with-coalesce-and-repartition-4050c57ad5c4#.s2l3yxemt
Principa The Top Predictive Analytics Pitfalls to avoid
http://insights.principa.co.za/the-top-predictive-analytics-pitfalls-to-avoid?utm_content=buffera2780&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer
ODPi Publishes First Operations Specification To Provide Developers Consistency Across Application Management Tools
As John talked about in our anniversary episode, the ODPI 2.0 released
https://www.odpi.org/announcements/2016/11/14/odpi-publishes-first-operations-specification-to-provide-developers-consistency-across-application-management-tools
25:30 Apache Software Foundation
The ASF
http://apache.org/
Overview
http://apache.org/foundation/
Process
http://apache.org/foundation/how-it-works.html
The Project List
http://apache.org/index.html#projects-list
Other Open Source Licence Options
http://choosealicense.com/
https://opensource.org/licenses
01:02:08 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
20 Dec 2016 | Episode 31 – Bold Predictions, Past and Future | 01:07:07 | |
In this episode, we go over the bold predictions for 2016 we made just before the start of the year. Find out how right we were, or indeed how bad we are at predicting the future of Big Data.
Undeterred, we then happily put on our Nostradamus hats and proceed to make even more new bold predictions for 2017. Have a listen and let us know if you agree or disagree with our view on the world?
00:03 Bold predictions - reviewing past predictions for 2016
Apace Atlas
Apache Nifi
Apache Spark SQL
BigInsights
28:50 Bold predictions - future predictions for 2017
Fragmentation
Data breaches
Chat bots
Self service Big Data
Snake-Oil Alert
Cyber security
In-Memory & GPU
Apache atlas
BigInsights
01:07:07 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
03 Jan 2017 | Episode 32 – The sense and non-sense of certifications | 00:50:59 | |
In this episode, we talk about the use and abuse of certifications, both the certifications you van achieve by passing an exam and the Industry ISV certifications that should help yu make purchasing decisions.
00:00 Recent events
Dave
5 enterprise uses of blockchain today
http://www.pcworld.com/article/3149504/cloud-computing/5-enterprise-related-things-you-can-do-with-blockchain-technology-today.html
Top 7 big data trends for 2017
https://datafloq.com/read/the-top-7-big-data-trends-for-2017/2493
How to discover the hidden value in your customer journey
https://www.linkedin.com/pulse/how-discover-hidden-value-your-customer-journey-ronald-van-loon
Jhon
Achieving a 300% speedup in ETL with Apache Spark
http://blog.cloudera.com/blog/2016/12/achieving-a-300-speedup-in-etl-with-spark/
The Rhythm of Food
http://rhythm-of-food.net/
http://www.thefunctionalart.com/
Information is beautiful awards
http://www.informationisbeautifulawards.com/news/188-2016-the-winners
Making data personal: Big data made small
http://blogs.sas.com/content/sgf/2016/12/13/making-data-personal-big-data-made-small/
27:50 The sense and non-sense of certifications
Educational certifications
ISV Certifications
50:59 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
17 Jan 2017 | Episode 33 – Roaring News | 00:50:24 | |
This episode, we have an absolutely brilliant topic that we were going to cover after the news section... But the news section has us talking so much that it ran a bit long. Preferring not to give you a two hour episode, we're rescheduling the delivery of the intended topic to next episode and present you with our first (and probably last) "News only" episode.
00:00 Recent events
Dave
A pair of “trends to watch in 2017”
http://www.techrepublic.com/article/6-big-data-trends-to-watch-in-2017/
http://www.datamation.com/applications/5-big-data-predictions-for-2017.html
Learning from a Year of Security Breaches
https://medium.com/starting-up-security/learning-from-a-year-of-security-breaches-ed036ea05d9b#.4r22rbfjh
Failing to monetise your apps, big data can help
http://www.techrepublic.com/article/failing-to-monetize-your-apps-big-data-can-help/
A Perfect Illustration of the Big Data Value Chain
http://www.techrepublic.com/article/a-perfect-illustration-of-how-the-big-data-value-chain-works/
Jhon
24/7 Spark Streaming on YARN in Production
https://www.inovex.de/blog/247-spark-streaming-on-yarn-in-production/
SparkSQL, Ranger,and LLAP via Spark thrift server for BI scenarios to provide row, column level security, and masking
http://hortonworks.com/blog/sparksql-ranger-llap-via-spark-thrift-server-bi-scenarios-provide-row-column-level-security-masking/
The Data Dichotomy: Rethinking the Way We Treat Data and Services
https://www.confluent.io/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/
50:24 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
31 Jan 2017 | Episode 34 – What do people get wrong when deploying Hadoop? – Part 1 | 01:00:45 | |
Paul Codding and Sheetal Dolas, both from Hortonworks, join us in this first part of a two part episode where they share their experience with what can go wrong when Hadoop is deployed. Listen to the tips and tricks these gentlemen share and double the throughput for your cluster.
00:00 Recent events
Dave
Apache Beam becomes a top level project!
https://beam.apache.org/
https://beam.apache.org/get-started/beam-overview/
https://github.com/eljefe6a/beamexample/blob/master/BeamTutorial/slides.pdf
https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
Four Types of Data Analytics
http://insights.principa.co.za/4-types-of-data-analytics-descriptive-diagnostic-predictive-prescriptive
MapR claims open source victory with patent
http://www.cbronline.com/news/verticals/cio-agenda/mapr-claims-open-source-big-data-victory-patent-award/
Jhon
Ransomware attacks on insecure Hadoop systems may be next, say security researchers
http://www.itworldcanada.com/article/ransomware-attacks-on-insecure-hadoop-systems-may-be-next-say-security-researchers/389944
http://www.gdi.foundation/
Revenge of the DevOps Gangster: Open Hadoop Installs Wiped Worldwide
http://www.threatgeek.com/2017/01/open-hadoop-installs-wiped-worldwide.html
Making Big Data User Friendly For Small Businesses
https://smallbiztrends.com/2017/01/big-data-and-small-business.html
30:15 What do people get wrong when deploying Hadoop? - Part 1
An interview with two guests from Hortonworks:
Paul Codding
Product Management Director at Hortonworks
Sheetal Dolas
Engineering Leader, Architect And Big Data Champion at Hortonworks
01:00:45 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
02 Aug 2016 | Episode 21 – The Open Data Platform Initiative | 00:59:22 | |
This episode we have an interview with John Mertic about ODPi. There has been plenty of mystery and even some controversy about ODPi which we attempt to resolve for you. Big thanks to John for giving us some of his time for this interview!
Sadly, this time the Skype Gods were not with us and we experienced some drops and hitches. We tried to smooth things over as much as possible, but we were not able to achieve our usual level of quality this time.
00:00 Recent events
Vacation for Dave
Study for Jhon
10:40 Interview with John Mertic @ ODPi
https://www.odpi.org/
John Mertic, Director of Program Management for ODPi and Open Mainframe Project
Find John on twitter: @jmertic
If you're not familiar with the ODPi here's a few good links to get you started and interested in the area:
Links to the ODPi Specifications: https://www.odpi.org/specifications
Watch an interview with Alan Gates who discusses what the ODPi is trying to do to simplify the big data world: https://www.youtube.com/watch?v=Vogw33pbNOE
Watch an interview with John Mertic who discusses how the ODPi compliance affects upstream Hadoop components: https://www.youtube.com/watch?v=siEkCutk_f8
56:30 Questions from our Listeners
No questions this episode... ask us more questions and we'll answer them!
59:22End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
19 Jul 2016 | Episode 20 – Dave’s Hadoop Summit San Jose 2016 Retrospective – Part 2 | 01:06:28 | |
In this second part, we discuss the sessions that Dave attended at the San Jose Hadoop Summit and we go in depth on some related topics. Since we ran over an hour with the main topic, and we did not want to make this a three-parter, we decided to forgo the questions from the audience just this one time...
00:00 Recent events
Vacation tine!
Edx.Org Big Data Courses
04:00 Dave's Hadoop Summit San Jose 2016 Retrospective - Part 2
Session 1: End-to-End Processing of 3.7 Million Telemetry Events per Second Using Lambda Architecture, by Saurabh Mishra @ Hortonworks and Raghavendra Nandagopal @ Symantec
Talking point: Hero-culture or why nobody wants to talk about failure anymore
Session 2: Top Three - Big Data Governance Issues and How Apache ATLAS resolves it for the Enterprise, by Andrew Ahn @ Hortonworks
Talking point: Guaranteed Governance, who certifies the certificate?
Session 3: IoT, Streaming Analytics and Machine Learning: Delivering Real-Time Intelligence With Apache NiFi, by Paul Kent @ SAS and Dan Zaratsian @ SAS
Talking point: Commercial solutions versus build your own in open source
Session 4: Productionizing Spark on YARN for ETL at Petabyte Scale, by Ashwin Shankar and Nezih Yigitbasi @ Netflix
Talking point: Is Hadoop stilll a low-cost commodity affair?
Session 5: Analyzing Telecom Fraud at Hadoop Scale, by Sanjay Vyas @ Diyotta
Talking Point: Do commercial, proprietary products have a place at Hadoop Summit or are they just marketing fluff?
01:06:28 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
05 Jul 2016 | Episode 19 – Dave’s Hadoop Summit San Jose 2016 Retrospective | 00:48:24 | |
Dave went to the Hadoop Summit 2016 in San Jose last week and came back with a riveting tale to tell. In this first part of the Summit coverage, join me when I ask Dave all about the keynotes and the general event. Join us next episode where Dave will talk about some of the sessions he attended!
00:00 Recent events
Lift and shift to IaaS
Hybrid Disaster Recovery
Spark & ML goodness MOOC's
San Jose Hadoop Summit
09:25 Dave went to the Hadoop Summit in San Jose!
Record attendance, maybe a venue change in future
Sponsor exhibition area including "interesting" story
The Community Corner
The keynotes
Hadoop is 10 years old
Microsoft on Machine Learning
Hadoop Assemblies
Hadoop fragmentation
Cyber security
Car insurance premiums "to measure"
Ethics session
40:55 Questions from our Listeners
Beefy feedback from Kris
A listener wants to know if it is worth the trip to go to the US Summit or to just go to the "local" Summit, wherever that is.
Nishant would like an episode about the entire ecosystem. What do you think?
48:24 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
21 Jun 2016 | Episode 18 – MLeap interview: Productionising Data Science – Part 2 | 00:43:18 | |
In this episode, we have the second part of the interview with Hollin Wilkins and Mikhail Semeniuk, the driving forces behind the MLeap project where they go into more technical details and give tips on deploying MLeap in your environment. If you are working with Spark, are deep into machine learning and are struggling to put those beautifully trained models into production, you definitely do not want to miss this episode!
00:00 Recent events
Yet more telco security, again.
RFI for european energy company followd by "the RFI rant"
Metronnnnnnnnnnn
Big Data Hackathon for an airline company predicting delays
Preparing an IoT hackathon on predictive maintenance
Spreading the word on MLeap at a couple of customers!
11:22 Interview on MLeap with Hollin Wilkins and Mikhail Semeniuk Part 2
http://combust.ml/
http://combust.ml/blog/2016/03/30/flexible-akka-clients-and-servers-part-1.html
https://github.com/TrueCar/mleap
https://github.com/TrueCar/mleap-demo
35:25 Questions from our Listeners
Are there other technologies that allow machine learing models to be exposed as "web" api's?
Zeppelin multi tenant right now?
43:17End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
07 Jun 2016 | Episode 17 – MLeap interview: Productionising Data Science | 00:54:02 | |
In this episode, we have an interview with Hollin Wilkins and Mikhail Semeniuk, the driving forces behind the MLeap project. If you are working with Spark, are deep into machine learning and are struggling to put those beautifully trained models into production, you definitely do not want to miss this episode!
00:00 Recent events
Machine Learning Hackathon on Azure
Strata Europe
Fighting with Kafka
09:30 Interview on MLeap with Hollin Wilkins and Mikhail Semeniuk
Meet Hollin and Mikhail today (7-Jun-2016) at Spark Summit 2016 in San Francisco!
https://spark-summit.org/2016/events/mleap-productionize-data-science-workflows-using-spark/
http://combust.ml/
http://combust.ml/blog/2016/03/30/flexible-akka-clients-and-servers-part-1.html
https://github.com/TrueCar/mleap
https://github.com/TrueCar/mleap-demo
40:50 Questions from our Listeners
The Episode 12 mystery unraveled
Nifi works well for prototyping, but what's your view on using Nifi in production in a normal DTAP (Development, testing, acceptance and production) environment?
54:00 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
24 May 2016 | Episode 16 – Interview part two with Sumeet Singh – Senior Director, Cloud and Big Data Platforms @ Yahoo! | 00:46:35 | |
Hopefully you enjoyed the first part of our interview with Sumeet, here is part two where we go into more detail about Yahoo's use of Hadoop, with lots of interesting topics coming up including the splintering of the ecosystem, governance and much much more.
00:00 Recent events
Customer and partner adventures with Apache Nifi
Jhon is settling in at Microsoft but is unfortunately quite jet-lagged.
08:15 Part two of our interview with Sumeet Singh - Senior Director, Cloud and Big Data Platforms @ Yahoo!
39:05 Questions from our Listeners
Is Apache Atlas Ready for production today?
46:35 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
10 May 2016 | Episode 15 – Interview with Sumeet Singh – Senior Director, Cloud and Big Data Platforms @ Yahoo! | 01:00:56 | |
Having met Sumeet at the Hadoop Summit we thought he'd make a great guest for the podcast, so here he is for your listening pleasure!
00:00 Recent events
Louder!
iTunes and the missing episode 12
Jhon's new role at Microsoft
Hadoop as a Service
A fortnight of SAS + Hadoop
Metron teething troubles https://issues.apache.org/jira/browse/METRON-136
17:50 Interview with Sumeet Singh - Senior Director, Cloud and Big Data Platforms @ Yahoo!
42:50 Questions from our Listeners
One data-lake for all workloads? Or separate clusters for each set of workloads?
How large a team do I need to manage a Hadoop cluster?
1:00:56 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
26 Apr 2016 | Episode 14 – Hadoop Summit – Retrospective | 00:51:47 | |
After the last two special edition episodes where we quickly covered each Summit day in a "same-day" episode, we go over the full event in this episode, highlighting the sessions we enjoyed the most and sharing our general feelings about the 2016 Hadoop Summit in Dublin.
00:00 Recent events
Summit!
Sessions on youtube
Meetings and planning, Apache Metron
https://cwiki.apache.org/confluence/display/METRON/Metron+Wiki
https://community.hortonworks.com/articles/26047/apche-metron-tp1-blog-series.html
Setting up a new podcast recording "studio"
09:00 Hadoop Summit - Retrospective
Summit Schedule App
Hortonworks emphasising Streaming ingest using Nifi, but the other talks did not so much
Summit video sessions are starting to appear online
https://www.youtube.com/channel/UCAPa-K_rhylDZAUHVxqqsRA/videos
Next year: Munich
Day one sessions:
It's not the size of your cluster, It's how you use it
Big Fish - David Darden & Don Smith
Unified stream and batch processing with Apache Flink
Artisans Gmbh - Ufuk Celebi
Taming the Elephant
Hortonworks - Paul Codding
How To: A beginners guide to becoming an apache contributor
Teradata - Venkatesh
On-Demand HDP Clusters using Cloudbreak and Ambari
Symantec - Karthik Karuppaiya & Narendra Bidari
Machine Learning in Big Data - Look Forward or be left behind
Redpoint Global Inc - Bill Porto
Past, Present, Future of hadoop at LinkedIn
LinkedIn - Carl Steinbach
Migrating Hundreds of Pipelines in Docker Containers
Spotify - Noa Resare
Day two sessions:
MLLeap: Or how to Productionize Data science workflows using Spark
Shift Technologies - Mihkail Semenluk & TrueCar - Hollin Wilkins
Scaling out to 10 Clusters, 1000 Users, and 10,000 Flows: The Dali Experience at LinkedIn
Carl Steinbach, LinkedIn
Hadoop Platform at Yahoo: A Year in Review
Sumeet Singh, Yahoo!, Inc.
Apache Hive 2.0 SQL Speed Scale
Hortonworks - Allen Gates
Telematics with Hadoop and Nifi
Adam Morton, Admiral Insurance - Simon Elliston Ball, Hortonworks
Apache Eagle - Monitor Hadoop in Real-Time
Ebay - Young Zang & Arun manoharan
43:18 Questions from our Listeners
Great question in from Rene about small businesses and Big Data which we’ll cover on a future episode!
Also Rene's feedback has helped us tweak the feedback form so it’s easier to use.
Is this a vendor podcast? No, we’re all community! :o)
How do you record the podcast, what is your equipment?
Skype-saurus: the original, expensive hardware solution.
http://www.leoville.com/the-skypesaurus-story (Sadly, this no longer seems to be available anymore.)
Skype-o-saurus: a cheaper solution using an OS-X agregate sound device.
https://drupalize.me/blog/201504/recording-podcasts-creating-skype-o-saurus
51:48 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
14 Apr 2016 | Episode 13 – Hadoop Summit Dublin 2016 – Day 2 | 00:37:47 | |
Welcome to our second special edition podcast bought to you from day 2 of the Hadoop Summit. Breaking our normal fortnightly flow we're delivering a fresh new podcast at the end of each day of the Hadoop Summit. In this episode we cover our impressions of the second day of keynotes and yet more sessions that we enjoyed.
00:00 Recent events
Introduction to the Hadoop Summit Dublin 2016 from day 2
01:45 Hadoop Summit 2016 Dublin Day 2 Review
Keynote/Session - Yahoo! - Sumeet Singh
Keynote - Information is Beautiful - David McCandless
http://www.informationisbeautiful.net/
MLeap - Mihael Semeniuk (shift Technologies) Hollin Wilkins (Truecar)
Admiral - Adam Morton (Admiral) and Simon Ball (Hortonworks)
Hive - Alan Gates (Hortonworks)
37:47 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
13 Apr 2016 | Episode 12 – Hadoop Summit Dublin 2016 – Day 1 | 00:29:38 | |
Welcome to our special edition podcast bought to you from day 1 of the Hadoop Summit. Breaking our normal fortnightly flow we're delivering a fresh new podcast at the end of each day of the Hadoop Summit.
In this episode we cover our impressions of the keynotes and some of the sessions we enjoyed during day 1.
00:00 Recent events
Introduction to the Hadoop Summit episode for day 1
01:40 Main Topic
Some comments from attendees as to what they're looking forward to at the event
Conversation about the keynotes and the sessions we enjoyed
29:38 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
05 Apr 2016 | Episode 11 – Interview with Community Award Winner Venkatesh Sellappa | 00:37:18 | |
Venkatesh is a new contributor to Apache NiFI and during his talk at the Hadoop Summit next week, he takes a light-hearted look at his journey of how to become a contributor to an Apache Project.
Venkatesh is one of the Community Choice winners, so congratulation are in order and we are certain you will like this interview! Enjoy, and we looking forward to seeing you at the Hadoop Summit in Dublin next week!
00:00 Recent events
Easter Break
Big Data Analytics
Big Telco workshops/meetings and sessions stuff
Domain Knowledge is important
05:40 Main Topic
Interview with Venkatesh Sellappa
33:50 Questions from our Listeners:
No questions this time but information on our activities during the upcoming Hadoop Summit.
37:18 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
22 Mar 2016 | Episode 10 – Preparing for the 2016 Hadoop Summit in Dublin | 01:03:50 | |
Next month, the European Hadoop Summit will take place in Dublin. Now that the agenda for the event has been nearly finalised we take it upon ourselves to provide a virtual guide to the event. There's a lot of good things happening during the event so we share with you what sessions we think we'll be attending and why. Enjoy, and looking forward to seeing you there!
This is another long episode, going over an hour for the first time. We are really curious to know if you like these longer episodes, or if you would prefer it if we kept it under the original 30 to 35 minutes?
00:00 Recent events
Hands on upgrading, express vs rolling upgrade
Workshop at telecom company in Russia
Nifi workshops
Securing a Hadoop cluster
08:00 Main Topic
Dave has assembled some statistics on the type of sessions available.
What sessions we would attend and why.
http://hadoopsummit.org/dublin/agenda/
General advice to visitors mixed in...
54:30 Questions from our Listeners:
What else is going on during the summit dates?
Should I visit the Hadoop Summit and if so, go to Europe, the US or Australia?
How do I get a speaking slot at summit?
https://hadoopsummit.uservoice.com/
What other events are comparable/usefull to visit?
01:03:50 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
08 Mar 2016 | Episode 9 – SQL in Hadoop | 00:53:38 | |
SQL was one of the first data access methods added to vanilla Hadoop. Considering that the many of the people working with Hadoop in the early days came from a database background, this is not surprising. Since then, the SQL ecosystem in Hadoop has grown considerably and in this episode we do a general overview of many of the available choices.This episode runs a bit longer than normal but we hope you'll find it worthwhile!
00:00 Recent events
Spark masterclasses
NiFi on trains
Mifid II and the active archive
World Mobile Congress
08:30 Main Topic
SQL solutions:
Apache Hive
https://hive.apache.org/
Apache Spark Sql
http://spark.apache.org/sql/
Apache Phoenix
https://phoenix.apache.org/
Apache Impala (incubating)
https://www.cloudera.com/products/apache-hadoop/impala.html
Apache Hawq (incubating)
http://hawq.incubator.apache.org/
Apache Drill
https://drill.apache.org/
Presto
https://prestodb.io/
Oracle Big Data Sql
http://www.oracle.com/us/products/database/big-data-sql/overview/index.html
IBM BigSql
http://www-01.ibm.com/software/data/infosphere/hadoop/big-sql.html
Technology topics:
JDBC/ODBC
SQL syntax compliance
Multi-user concurrency
Benchmarks
46:40 Questions from our Listeners:
How much storage overhead should I count on if I add SQL in my Hadoop workflow?
How do I make my sql faster?
53:38 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
23 Feb 2016 | Episode 8 – NiFi Deeper Dive | 00:47:18 | |
In this episode we'll go into more depth on NiFi complete with our second interview with Joe Witt, Senior Director of Engineering at Hortonworks who dives into how NiFi works under the covers and some considerations to think about when using it for real.
00:00 Recent events
New logo for the podcast
Hadoop use in telecom
Spark masterclass details
Apache Nifi "Hype Train" concerns
09:14 Main Topic
Second interview with Joe Witt: a deeper dive on Apache NiFi
35:30 Questions from our Listeners:
I have already implemented some of my ingest in flume/kafka/storm, do I need to replace that with NiFi?
Is it true there is no chance of data loss with NiFi?
Can I aggregate or combine data as part of the flow process?
Do I need a hadoop cluster to use NiFi?
47:18 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
09 Feb 2016 | Episode 7 – An introduction to Data Ingest | 00:37:15 | |
In this episode we'll cover some of the most common options for ingesting data into Hadoop including technologies like Flume, Sqoop, Kafka, NiFi and more.
00:00 Recent events
Upcoming masterclasses on NiFi and Spark
NiFi deployment on trains
Podcast publicizing
Global Systems Integrator training day
06:40 Main Topic
Apache Sqoop
Apache Flume
Apache Kafka
Apache NiFi
Other Low level ingest methods
28:00 Questions from our Listeners:
I want to transform the data to it’s final form before it lands in the Hadoop cluster. Which ingest tool should I use?
What about XYZ vendors “hadoop loader/ingest” tool ?
Do all these tools run on my hadoop nodes?
How does lambda architecture fit with data ingest?
37:15 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
26 Jan 2016 | Episode 6 – An introduction to NiFi | 00:30:45 | |
In this episode we'll cover some an introduction to NiFi complete with an interview with Joe Witt, Senior Director of Engineering at Hortonworks who explains exactly where NiFi came from and how it fits into your Big Data plans.
00:00 Recent events
The usual "Start of the Year" meetings and events
Using Apache NiFi as a self documenting deployment system
We are now available on iTunes
04:50 Main Topic
Interview with Joe Witt, one of the creators of Apache NiFi and currently Director of Engineering for HDF at Hortonworks.
22:40 Questions from our Listeners:
Is NiFi really as easy to use as it looks?
Is NiFi a part of Hadoop now?
>How do I get started with NiFi?
Is NiFi an ETL tool?
30:45 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
12 Jan 2016 | Episode 5 – An introduction to Spark | 00:37:50 | |
In this episode we'll cover the basics of Apache Spark, including typical deployment situations, architecture and usage.
00:00 Recent events
Seasons Greetings!
Jhon shamelessly plugs his mini cluster build
Apache Mesos
Amazon IoT solution
05:28 Main Topic
Who would use Apache Spark, why would you use it, where would you use it
Apache Spark Architecture
Apache Spark Components
Apache Spark MLlib
Apache Spark gotcha's
Typical use cases for Apache Spark
28:20 Questions from our Listeners:
What happens if all my data does not fit in memory?
What is the security like for Spark?
Why Spark on Hadoop instead of standalone
Python, Scala, Java or something else for Spark?
Can I access data on HDFS or local disk from my Spark script?
37:50 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
| |||
16 Aug 2016 | Episode 22 – Big Data in Small Business | 01:32:35 | |
The main subject in this episode features answer to a listener question we received a couple of months ago: How can big data help small businesses? What ways can small business use big data? At the moment all the talk is about big data helping enterprise firms. And we are introducing a new section which we hope you will enjoy!
00:00 Recent events
Working with a new team in sunny cork, getting them up to speed
Workshop with a global SI and a European tel-co about the upcoming phases of their big data journey
Workshop with a customer who has been using Hadoop for a very long time, since Hadoop 0.2! Finally looking to migrate into the future
Multi vendor workshop fraud analytics
Object recognition and detection in images.
11:30 Our very own "New and Noteworthy"
Dave
http://blogs.teradata.com/international/streaming-analytics-story-many-tales/
http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A453888
http://research.ibm.com/cognitive-computing/ostp/rfi-response.shtml
http://dataconomy.com/10-online-big-data-courses-2016/
Jhon
Apache Spark 2.0 (July 28, 2016)
http://spark.apache.org/releases/spark-release-2-0-0.html
Unifying DataFrame and Dataset (RDD): In Scala and Java, DataFrame and Dataset have been unified, i.e. DataFrame is just a type alias for Dataset of Row.
SparkSession: new entry point that replaces the old SQLContext and HiveContext for DataFrame and Dataset APIs.
MLLib: The DataFrame-based API is now the primary API. The RDD-based API is entering maintenance mode.
Spark 2.0 substantially improved SQL functionalities with SQL2003 support. Spark SQL can now run all 99 TPC-DS queries
Ships the initial experimental release for Structured Streaming, a high level streaming API built on top of Spark SQL
Databricks article:
https://databricks.com/blog/2016/07/28/continuous-applications-evolving-streaming-in-apache-spark-2-0.html
Apache Mesos 1.0 released
https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces97
http://techblog.netflix.com/2016/07/distributed-resource-scheduling-with.html
Apache Twill becomes top level project
http://twill.apache.org/
https://blogs.apache.org/foundation/entry/apache_software_foundation_announces_apache1
44:40 Big Data for Small Business
Define "small business"
How can big data help small businesses
What ways can small business use big data
The problems a small business could face
http://www.columnfivemedia.com/100-best-free-data-sources-infographic
Our answers to those problems
Some conclusions
01:32:35 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
06 Mar 2018 | Episode 77 – Roaring News | 00:48:10 | |
Another Roaring News wpisode where we cover recent Big Data News items we found interesting.
This time we talk about Open Source turning 20 years old, the annoyances that come with Smart Homes and a big data device in Germany. Additionally, we talk about some introductory guides to AI.
Breaking News
20 years of open source + who contributes
http://www.zdnet.com/article/open-source-turns-20/
https://www.infoworld.com/article/3253948/open-source-tools/who-really-contributes-to-open-source.html
Smart home living is annoying as hell
https://gizmodo.com/the-house-that-spied-on-me-1822429852
Big Data Divide
https://www.politico.eu/article/to-protect-or-collect-germanys-big-data-divide/
The Art of Learning Data Science
https://medium.com/@aparnack/the-art-of-learning-data-science-65b9f703f932
The Long Road To Become a Big Data Scientist - Infographic
https://medium.com/@aparnack/sequel-to-the-art-of-learning-data-science-cb2e1f078e5a
An executive’s guide to AI
https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/an-executives-guide-to-ai?cid=other-soc-twi-mip-mck-oth-1802&kui=udT5IIoYx3yxUmZYJz7_2A
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
20 Mar 2018 | Episode 79 – Roaring News | 00:37:19 | |
Another Big Data news episode! This time we consider the Big or small nodes conundrum based on an article that after close scrutiny doesn't really seem to test the real issue. Other things that get covered are Linkedin's Dynanometer, Cloudera's full production architecture advise for a recommendation service and a really interesting visualization technique based on blobs.
Breaking News
Big Data, Small Nodes
https://insidebigdata.com/2018/02/22/make-sense-big-data-small-nodes/
Dynamometer Release
https://github.com/linkedin/dynamometer
https://venturebeat.com/2018/02/08/linkedin-open-sources-dynamometer-for-hadoop-performance-testing-at-scale/
Cisco IoT predictions
Aka someone somewhere trots out the old “data is the new oil” trope for one more circuit, please please please stop?
https://www.networkworld.com/article/3257769/internet-of-things/7-transportation-iot-predictions-from-cisco.html
Production Recommendation Systems with Cloudera
http://blog.cloudera.com/blog/2018/02/production-recommendation-systems-with-cloudera/
A Day in the Life of Americans
http://flowingdata.com/2015/12/15/a-day-in-the-life-of-americans/
Intercontinental Ballistic Microfinance (2006)
https://vimeo.com/28413747
Understanding AI, Machine Learning & Predictive Analytics
https://www.forcecast.com/blog/understanding-ai-machine-learning-predictive-analytics/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
27 Mar 2018 | Episode 80 – Big Data Tracking | 00:51:25 | |
Last June, Wolfie Christl published a 93 page report Corporate Surveillance in Everyday Life using big data tracking. Apart from the massive pdf that can be downloaded on the net, an extensive summary can be found on the Cracked Labs website.
In this episode we go over the content and give our views on the subject.
If you want to follow along with us while we are discussing the different point in the onlin earticle, here is the link: http://crackedlabs.org/en/corporate-surveillance
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
03 Apr 2018 | Episode 81 – Roaring News | 00:26:19 | |
In this installment of Big Data News, we talk about the recent Facebook leak, how everybody is still doing it wrong (according to some at least) and installing Hadoop "the old-fashioned way". Also briefly covered is Elastic's X-Pack, now even more "open" than before, but still rather closed it would seem.
Breaking News
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
10 Apr 2018 | Episode 82 – DataWorks Summit Berlin 2018 Preview | 00:47:38 | |
Next week is DataWorks Summit Berlin week! Your two hosts will be in attendance and in this episode we go over the agenda and plan which sessions we want to attend and why. Peppered throughout we add further insights and experiences from previous years.
Unfortunately, Dave's network was a little unstable and there are a couple audio glitches in this episode.
For some session statistics or if you can use some help deciding what sessions you want to attend, you can use the dashboard we created:
Click the screenshot above or go to http://aka.ms/DWS2018 to access the dashboard. It is a dynamic report: clicking on graph elements (bars of pie slices) will apply filters on all the visualizations and the session list. Use control-click to combine filters.
At some point the dashboard will dissapear because it is no longer relevant. for future reference, here is a large version of the screenshot.
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
19 Jun 2018 | Episode 93 – Apache Kylin: Extreme OLAP Engine for Big Data | 00:46:14 | |
In this episode Apache PMC member Dong Li joins us to explains how Apache Kylin can deploy Analytical OLAP cubes in your Big Data environment.
http://kylin.apache.org/
Dong Li
Technical Partner & Senior Architect of Kyligence (linkedin)
PMC Member of Apache Kylin
http://en.kyligence.io/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
01 May 2018 | Episode 86 – Druid: a high-performance, column-oriented, distributed data store – part 1 | 00:31:57 | |
This is the first part of an interview with Fangjin Yang, co-founder and CEO at Imply and committer/PMC member for the Druid project. Druid: a high-performance, column-oriented, distributed data store which has entered the Hadoop environment with the recent integration with Apache and we since Druid has been around for a while, we are grateful to FJ for spending some time with our listeners.
Fangjin Yang
Cofounder and CEO at Imply (linkedin)
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
08 May 2018 | Episode 87 – Druid: a high-performance, column-oriented, distributed data store – part 2 | 00:31:53 | |
This is the second part of an interview with Fangjin Yang, co-founder and CEO at Imply and committer/PMC member for the Druid project. Druid: a high-performance, column-oriented, distributed data store which has entered the Hadoop environment with the recent integration with Apache and we since Druid has been around for a while, we are grateful to FJ for spending some time with our listeners.
Fangjin Yang
Cofounder and CEO at Imply (linkedin)
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
18 Apr 2018 | Episode 83 – DataWorks Summit Berlin – Day 1 Recap | 01:23:45 | |
Another year, another European Dataworks Summit, and yes, another daily recap show from Jhon and Dave. We walk through the keynotes and sessions we attended and give our thoughts and views. This should be useful for anyone who wasn't able to attend or those seeking to peek into sessions they couldn't make.
No real editing on this one, recording in a hotel room so audio quality may not be up to our usual standards, we hope you'll forgive us!
Enjoy!
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
19 Apr 2018 | Episode 84 – DataWorks Summit Berlin – Day 2 Recap | 01:30:26 | |
And with the end of day two of the 2018 DataWorks Summit in Berlin comes the end of this years Europe Summit. But never fear, we have an extra 90 minutes of DataWorks goodness for you to consume on your way home.
No real editing on this one, recording in a hotel room so audio quality may not be up to our usual standards, we hope you'll forgive us!
Enjoy!
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
24 Apr 2018 | Episode 85 – DataWorks Summit Community Showcase Exhibitor Soundbites | 00:30:34 | |
This is the final part of our coverage of the DataWorks Summit Berlin 2018. Normally we would not have had an episode this week, since we were in Berlin last week, but we had lightning interviews with the vendors in the Community Expo Are and used that coverage to make this episode.
So less of "Dave & Jhon" and more "ecosystem tech" snippets this time. Even though this does stray a bit from our usual content, we still hope it is useful.
This was recorded in a hotel room and on the expo floor so the audio quality is not up to our usual standards, we hope you’ll forgive us!
Here is a timestamped list of the lightning interviews:
02:41 Hortonworks https://hortonworks.com/
06:28 Alation https://alation.com/
08:45 Arcadia Data https://www.arcadiadata.com/
11:12 Attunity https://www.attunity.com/
13:10 BlueMetrix https://www.bluemetrix.com/
15:27 BMW https://www.bmw.com
18:04 IBM https://www.ibm.com
19:54 Microsoft https://www.microsoft.com
22:15 Nutanix https://www.nutanix.com/
23:26 Syncsort https://www.syncsort.com
24:54 Synerscope http://www.synerscope.com/
27:05 Talend https://www.talend.com
27:59 Teradata https://www.teradata.com/
29:02 -Interview End-
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
15 May 2018 | Episode 88 – Roaring News | 00:35:07 | |
Returning to our more regular schedule, we have a Roaring News episode today. Dave has articles on multi-cloud readiness, Big Data being a pariah, and Google Duplex and Jhon came up with Synthetic data, data engineers and scientists and a Neural Network sharing cake recipes.
Breaking News
Dave
Less than 10% ready for multi cloud
http://www.cloudpro.co.uk/cloud-essentials/hybrid-cloud/7451/idc-less-than-10-of-organisations-are-ready-for-multi-cloud
Tech companies distancing themselves from Big Data
https://qz.com/1262102/tech-companies-are-distancing-themselves-from-big-data/
Google Duplex
https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html
Jhon
The Rise of Synthetic Data to Help Developers Create and Train AI Algorithms Quickly and Affordably
https://insidebigdata.com/2018/05/08/rise-synthetic-data-help-developers-create-train-ai-algorithms-quickly-affordably/
Data engineers vs. data scientists
https://www.oreilly.com/ideas/data-engineers-vs-data-scientists?utm_medium=social&utm_source=twitter.com&utm_campaign=awareness&utm_content=radar+content+datascience
We asked a neural network to bake us a cake. The results were...interesting.
https://www.popsci.com/neural-network-bakes-a-cake
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
22 May 2018 | Episode 89 – DataWorks Summit San Jose Agenda Review | 01:12:20 | |
With the San Jose edition of the DataWorks Summit only a month away, we go over the sessions that are available in the agenda today and offer our top picks. If you're going, or if you will be watching the replays online, we hope to guide you on your selection of sessions.
DataWorks Summit San Jose 2018
And here is the dashboard we created with statistics on the San Jose sessions, for your enjoyment: https://aka.ms/DWS2018SJ
The agenda is still in flux so we will be updating the dashboard regularly.
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
29 May 2018 | Episode 90 – Roaring news | 00:38:09 | |
In this weeks Roaring News episode, Dave brings up the resilience of Apache Community open source projects and plays some Doom. Jhon has some practical Apache NIFI guides and the emergence of multi modal NoSQL databases.
Breaking News
DataWorks Summit Berlin video recordings are up:
https://www.youtube.com/user/HadoopSummit/playlists
Find Dave on his Australian road-trip:
http://bit.ly/aus-nz-ibm-hwx-tour
Dave
DataTorrent, Stream Processing Startup, Folds (Apache Apex)
https://www.datanami.com/2018/05/08/datatorrent-stream-processing-startup-folds/
DOOM!
https://arxiv.org/abs/1804.09154
https://www.technologyreview.com/s/611072/ai-generates-new-doom-levels-for-humans-to-play/
https://www.youtube.com/watch?v=K32FZ-tjQP4
Bonus doom news:
https://www.rockpapershotgun.com/2018/03/28/dodge-fireballs-forever-in-a-neural-nets-doom-nightmare/
https://worldmodels.github.io/
Jhon
Accessing Feeds from EtherDelta on Trades, Funds, Buys and Sells (Apache NiFi)
https://community.hortonworks.com/articles/191146/accessing-feeds-from-etherdelta-on-trades-funds-bu.html?es_p=6741162
NiFi Processing and Flow with Couchbase Server
https://blog.couchbase.com/nifi-processing-flow-couchbase-server/
The new era of the Multi-Model Database
https://www.zdnet.com/article/the-new-era-of-the-multi-model-database/
Seven Databases in Seven Weeks, Second Edition - A Guide to Modern Databases and the NoSQL Movement
https://pragprog.com/book/pwrdata/seven-databases-in-seven-weeks-second-edition
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
05 Jun 2018 | Episode 91 – ODPi is back and better than ever! | 01:08:00 | |
In this episode, we welcome back John Mertic, director of Program Management for ODPi, R Consortium, and the Open Mainframe Project. It's been almost two years since we checked in with John and the ODPi initiative and as John mentions in the interview, a lot has changed in Hadoop...
ODPi logo
John Mertic
Director of Program Management for ODPi, R Consortium, and Open Mainframe Project
https://www.linkedin.com/in/jmertic/
ODPi website links:
https://www.odpi.org/
https://www.odpi.org/blog/2018/04/04/the-state-of-open-source-and-big-data-three-years-later
https://www.odpi.org/projects/data-governance-pmc
https://www.odpi.org/events
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
12 Jun 2018 | Episode 92 – Roaring news | 00:46:08 | |
Another week, another edition of Roaring Big Data News. This time, Dave talks about driving teens and Jhon takes a detailed look at an Eventbrite data pipeline article.
Breaking News
Dave
Driver monitoring isn't just for teens; adults can benefit, too
https://arstechnica.com/cars/2018/05/buicks-smart-driver-explains-why-my-gas-mileage-sucks-and-my-editors-doesnt/
Jhon
Looking under the hood of the Eventbrite data pipeline!
https://www.eventbrite.com/engineering/looking-under-the-hood-of-the-eventbrite-data-pipeline/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
26 Jun 2018 | Episode 94 – Roaring news | 00:37:39 | |
I this weeks edition of Roaring Big Data News, Dave talks about modernizing Hadoop and a billion java errors. Jhon has an article on improving your learning data sets. We finish with a discussion about the newly released HDP 2.6.5 with an emphasis on the deprecation notices and Yarn Containers.
Breaking News
Dave
Modernizing Hadoop: Reaching the plateau of productivity
https://www.zdnet.com/article/modernizing-hadoop-reaching-the-plateau-of-productivity/
1 billion Java errors, here’s what causes 97% of them
https://blog.takipi.com/we-crunched-1-billion-java-logged-errors-heres-what-causes-97-of-them/
https://blog.takipi.com/the-top-10-exceptions-types-in-production-java-applications-based-on-1b-events/
Jhon
Why you need to improve your training data, and how to do it
https://petewarden.com/2018/05/28/why-you-need-to-improve-your-training-data-and-how-to-do-it/amp/
Announcing the General Availability of Hortonworks Data Platform (HDP) 2.6.5, Apache Ambari 2.6.2 and SmartSense 1.4.5
https://hortonworks.com/blog/announcing-general-availability-hortonworks-data-platform-hdp-2-6-5-apache-ambari-2-6-2-smartsense-1-4-5/
Component Versions
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_release-notes/content/comp_versions.html
Deprecation Notices
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_release-notes/content/deprecated_items.html
YARN Containers
Trying out Containerized Applications on Apache Hadoop YARN 3.1
https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/
Containerized Apache Spark on YARN in Apache Hadoop 3.1
https://hortonworks.com/blog/containerized-apache-spark-yarn-apache-hadoop-3-1/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
17 Jul 2018 | Episode 97 – ODPi: A new world for data governance | 01:07:57 | |
In this episode, we welcome back John Mertic one more time. It was quite obvious that John had lots more to talk about at the end of our last interview with him. ODPi has recently reinvented itself, moving away from a strict distribution standards body towards data governance and reference specifications.
ODPi logo
John Mertic
Director of Program Management for ODPi, R Consortium, and Open Mainframe Project
https://www.linkedin.com/in/jmertic/
ODPi website links:
https://www.odpi.org/
https://www.odpi.org/blog/2018/04/04/the-state-of-open-source-and-big-data-three-years-later
https://www.odpi.org/projects/data-governance-pmc
https://www.odpi.org/events
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
10 Jul 2018 | Episode 96 – Roaring news | 00:46:05 | |
In this edition of Roaring news, Ward Bekker returns to discuss what is happening in the world of Big Data. Ward brings news on GPUs in supercomputers and how Big Data could be wrong about you. Dave and Jhon found articles on Big data growth visualizations and GDPR.
Breaking News
10 Charts that will change your perspective of Big Data’s Growth
https://www.forbes.com/sites/louiscolumbus/2018/05/23/10-charts-that-will-change-your-perspective-of-big-datas-growth/#1ea595702926
New GPU-Accelerated Supercomputers Change the Balance of Power on the TOP500
https://www.top500.org/news/new-gpu-accelerated-supercomputers-change-the-balance-of-power-on-the-top500/
GDPR: A Call to Remove Technical Debt from Data Science
https://medium.com/@kjarmul/gdpr-a-call-to-remove-technical-debt-from-data-science-c103a01c3102
Everything big data claims to know about you could be wrong
http://news.berkeley.edu/2018/06/18/big-data-flaws/
Our thanks to Ward for adding some variety to this News episode.
Ward Bekker (Linkedin)
Pre-Sales Solutions Engineer II @ Hortonworks
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
31 Jul 2018 | Episode 99 – The State of Big Data at Codemotion Amsterdam | 00:45:28 | |
The Roaring Elephant podcast was a guest at the Codemotion conference in Amsterdam a little while ago. This episode contains the audio of the talk we did on the State of Big Data.
Our talk was dfinitely light on slideware, but if you want to see the video cast of our presentation, you can find it on the Codemotion youtube channel:Codemotion Amsterdam 2018: The State of Big Data by Roaring Elephant podcast
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
24 Jul 2018 | Episode 98 – Roaring news | 00:22:16 | |
In this episode of Big Data Roaring News, Dave laments another announcement of Hadoop's demise and exposes A.I. imposters. Jhon has articles comparing Ranger with Sentry and Apache Nifi reaching the ripe age of 1.7 with a Minifi charged practical demo to prove the point.
Breaking News
Hadoop’s star dims in the era of cloud object data storage and stream computing
https://siliconangle.com/blog/2018/07/09/hadoops-star-dims-era-cloud-object-data-storage-stream-computing/
The rise of “pseudo-ai” how tech firms quietly use humans to do bots work
https://www.theguardian.com/technology/2018/jul/06/artificial-intelligence-ai-humans-bots-tech-companies
Apache Ranger Vs Sentry
https://www.linkedin.com/pulse/apache-ranger-vs-sentry-mythily-rajavelu/
How to build an IIoT system using Apache NiFi, MiNiFi, C2 Server, MQTT and Raspberry Pi
https://medium.freecodecamp.org/building-an-iiot-system-using-apache-nifi-mqtt-and-raspberry-pi-ce1d6ed565bc
Apache Nifi Version 1.7.0 released: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
07 Aug 2018 | Episode 100 – Celebrating our Centennial with the history of Hadoop | 01:07:19 | |
100 Big Data episodes! We made it, in no small part thanks to our audience: you are who keeps us going! In this episode we celebrate our centennial by going over the history of Hadoop releases, highlighting the most noteworthy events along the way. Join us down the twisty paths of our memory lanes!
The blockchain related Linkedin post Jhon liked
The sources for this episode:
http://hadoop.apache.org/releases.html
https://en.wikipedia.org/wiki/Apache_Hadoop
Debate over which company had contributed more to Hadoop:
http://hortonworks.com/blog/reality-check-contributions-to-apache-hadoop/
Thank you for being part of the ride and now on to episode 200!
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
03 Jul 2018 | Episode 95 – DataWorks Summit in San Jose with Ward Bekker | 01:52:50 | |
Since both Dave and Jhon were not able to attend the Dataworks Summit in San Jose a couple of weeks ago, we have a guest, Ward Bekker, who was happy to join and educate us on the subject.
DataWorks Summit San Jose 2018
In this episode we discuss the daily keynotes and Wards' selection of sessions at the Summit ranging from the new things in Yarn 3.0, Materialized views in Hive and much more.
Ward Bekker (Linkedin)
Pre-Sales Solutions Engineer II @ Hortonworks
Some of the sessions and topics discussed are:
Apache Hadoop State of the union
https://dataworkssummit.com/san-jose-2018/session/apache-hadoop-yarn-state-of-the-union-2/
What is new in Apache Hive
https://dataworkssummit.com/san-jose-2018/session/what-is-new-in-apache-hive/
Runing distributed tensorflow in production
https://dataworkssummit.com/san-jose-2018/session/running-distributed-tensorflow-in-production-challenges-and-solutions-on-yarn-3-0-2/
Just the sketch: advanced streaming analytics in Apache Metron
https://dataworkssummit.com/san-jose-2018/session/just-the-sketch-advanced-streaming-analytics-in-apache-metron/
Containers and Big Data
https://dataworkssummit.com/san-jose-2018/session/containers-and-big-data/
Catch a hacker in realtime: Live visuals of bots and bad guys
https://dataworkssummit.com/san-jose-2018/session/catch-a-hacker-in-realtime-live-visuals-of-bots-and-bad-guys/
HDFS tiered storage
https://dataworkssummit.com/san-jose-2018/session/hdfs-tiered-storage/
Geospatial data platform at Uber
https://dataworkssummit.com/san-jose-2018/session/geospatial-data-platform-at-uber/
What's the Hadoop-la about Kubernetes?
https://dataworkssummit.com/san-jose-2018/session/whats-the-hadoop-la-about-kubernetes/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
14 Aug 2018 | Episode 101 – Apache Pulsar update with Matteo and Sijie from Streamlio | 01:05:48 | |
Matteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts and here is the first part where they introduce Apache Pulsar, go in depth on the correct deployment scaling of a stable Pulsar cluster and clarify Pulsars "at least once vs exactly once" strategy. Part two will go in more depth on what's new. Stay tuned!
Apache Pulsar logo
Matteo Merli (https://www.linkedin.com/in/matteomerli/)
Co-Founder - Software Engineer
Sijie Guo (https://www.linkedin.com/in/samuelguo/)
Co-Founder
Apache Pulsar (incubating)
https://pulsar.apache.org/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
28 Aug 2018 | Episode 103 – Apache Pulsar version 2.0 with Matteo and Sijie from Streamlio | 00:43:31 | |
Matteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts. the first of which was published in episode 101. Here is the second part with information on version 2.0 and the future of the Apache Pulsar project.
Apache Pulsar logo
The first subject taken on by Sijie is Pulsar Functions, followed by Matteo talking about the new schema registry and Topic Compaction. With a new major version being released, users will probably want to upgrade so we asked the guys about the upgrade path. The rest of the episode, Matteo and Sijie share what they can regarding the future Pulsar Roadmap.
Matteo Merli (https://www.linkedin.com/in/matteomerli/)
Co-Founder - Software Engineer
Sijie Guo (https://www.linkedin.com/in/samuelguo/)
Co-Founder
Apache Pulsar (incubating)
https://pulsar.apache.org/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
21 Aug 2018 | Episode 102 – Roaring News | 00:22:07 | |
Big Data News at the end of the summer is not easy to find, but we did end up with three topics to discuss: from isolating GPUs in Hadoop 3.x to replicating big data (to the cloud) and quick tips from Adam's blog.
Breaking News
First Class GPUs support in Apache Hadoop 3.1, YARN & HDP 3.0
https://hortonworks.com/blog/gpus-support-in-apache-hadoop-3-1-yarn-hdp-3/
Replicating big datasets in the cloud
https://medium.com/hotels-com-technology/replicating-big-datasets-in-the-cloud-c0db388f6ba2
https://dataworkssummit.com/berlin-2018/session/tools-and-approaches-for-migrating-big-datasets-to-the-cloud/
https://www.slideshare.net/Hadoop_Summit/tools-and-approaches-for-migrating-big-datasets-to-the-cloud
Quick Tip: The easiest way to grab data out of a web page in Python
https://medium.com/@ageitgey/quick-tip-the-easiest-way-to-grab-data-out-of-a-web-page-in-python-7153cecfca58
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
04 Sep 2018 | Episode 104 – Roaring News | 00:36:55 | |
In this Big Data News episode, we discuss an article with guidelines on how you should arrange your data gathering projects with the customer in mind. Dave brings a matrix of visualization products.
Breaking News
The five Cs: Five framing guidelines to help you think about building data products.
https://www.oreilly.com/ideas/the-five-cs?utm_medium=social&utm_source=twitter.com&utm_campaign=awareness&utm_content=radar+content
The Chartmaker Directory
http://chartmaker.visualisingdata.com/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
18 Sep 2018 | Episode 106 – Roaring News | 00:39:15 | |
In this edition of Big Data News, we take the pulse of Machine learning adoption and talk about Big Data Online Learning by IBM on Coursera and by Columbia University on Edx. We round the episode off with a look at MR3 and the evil that are benchmarks.
Breaking News
Data Science Professional Certificate
https://cognitiveclass.ai/blog/data-science-professional-certificate/
Taking the pulse of machine learning adoption
https://www.zdnet.com/article/taking-the-pulse-of-machine-learning-adoption/
Performance Comparison of HDP LLAP, Presto, SparkSQL, Hive on Tez, and Hive on MR3 using the TPC-DS Benchmark
https://mr3.postech.ac.kr/blog/2018/08/15/comparison-llap-presto-spark-mr3/
Join Jhon on Artificial Intelligence (AI) & Robotics by ColumbiaX on Edx
https://www.edx.org/micromasters/columbiax-artificial-intelligence
https://www.edx.org/course/robotics-columbiax-csmm-103x-4
https://www.edx.org/course/artificial-intelligence-ai-columbiax-csmm-101x-4
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
11 Sep 2018 | Episode 105 – Big Data at British Telecom with Phillip Radley | 01:06:32 | |
In this episode we welcome Phil Radley, Chief Data Architect at BT to talk about the Big Data deployment at BT.
Phillip Radley (Linkedin)
Chief Data Architect @ BT
https://home.bt.com/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
25 Sep 2018 | Episode 107 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 1 | 00:41:50 | |
In this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us.
In this first part, the focus is more on Mandy herself and we lay the groundwork for the second part that will go live in episode 109.
Mandy Chessell
Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineering
https://www.linkedin.com/in/mandy-chessell-a4989722/
ODPi Blog post on Egeria: First Release of ODPi Egeria is Here
ODPi github projects:
Egeria - Open Metadata and Governance
https://github.com/odpi/egeria
Data-governance companion project
https://github.com/odpi/data-governance
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
09 Oct 2018 | Episode 109 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 2 | 00:52:10 | |
In this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us.
In this second part, we discuss the ins and outs of good data stewardship and how companies can adopt, implement and contribute.
Mandy Chessell
Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineering
https://www.linkedin.com/in/mandy-chessell-a4989722/
ODPi Blog post on Egeria: First Release of ODPi Egeria is Here
ODPi github projects:
Egeria - Open Metadata and Governance
https://github.com/odpi/egeria
Data-governance companion project
https://github.com/odpi/data-governance
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
02 Oct 2018 | Episode 108 – Roaring News | 00:55:57 | |
Another episode of Big Data News and not just another episode, but an episode packed and packed with items. Before we do our regular article reviews, we are doing raffles for not one, not two but three different events! And as if that was not enough, our friends from Pulsar dropped in with their big Apache top-level project announcement.
So not very bite sized this time, but smack full of delicious Big Data news!
Breaking News
Our thanks to our guests:
Solix Empower
Sai Gundavelli
Founder/CEO, Solix Technologies
Streamlio
Sanjeev Kulkarni
Co-Founder at Streamlio
Sijie Guo
Co-Founder at Streamlio
Free Big Data Event ticket giveaways:
DataWorks Summit Asia Pacific
Singapore Oct 11, 2018 - Tokyo Oct 16, 2018 - Melbourne Feb 06, 2018
To enter the raffle, send email to dws18apac@roaringelephant.org
Tell us what event you want to attend! (Singapore, Tokyo, Melbourne)
Solix Empower New York 2018
New York November 01, 2018
To enter the raffle, send email to SolixEmpower18@roaringelephant.org
H2O AI World London
London October 29-30, 2018
To enter the raffle, send email to h2oLondon18@roaringelephant.org
Please note that we are giving away discount codes that will give you access to the events for free. You still need to arrange your own travel and lodging!
News articles:
The Apache Software Foundation Announces Apache® Pulsar™ as a Top-Level Project
https://blogs.apache.org/foundation/entry/the-apache-software-foundation-announces39
https://github.com/apache/pulsar
Who wrote that anonymous NYT op-ed? Text similarity analyses with R
http://blog.revolutionanalytics.com/2018/09/anonymous-nyt-op-ed.html
Beyond Interactive: Notebook Innovation at Netflix
https://medium.com/netflix-techblog/notebook-innovation-591ee3221233
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
16 Oct 2018 | Episode 110 – Roaring News | 00:38:23 | |
Another week, another Big Data News episode. After going over all the event ticket giveaways that are currently going on, we have an article that goes over the basics on ETL vs ELT and have some fun with R graphs by the XKCD web comic. We finish with an in depth article on columnar data stores and a quick shout-out to Apache Nifi.
Breaking News
Our thanks to our guest from H2O.ai:
John Spooner
Director of Solution Engineering, h2o.ai
Dave:
XKCD Curve Fitting in R
http://blog.revolutionanalytics.com/2018/09/curve-fitting.html
Artificial intelligence, data will be the differentiator in the marketplace
https://www.information-age.com/artificial-intelligence-data-123475102/
Jhon:
Scaling ETL: How data pipelines evolve as your business grows
https://bytes.grubhub.com/scaling-etl-how-data-pipelines-evolve-as-your-business-grows-72ff6c744e6e
The design and implementation of modern column-oriented database systems
https://blog.acolyer.org/2018/09/26/the-design-and-implementation-of-modern-column-oriented-database-systems/
Apache NiFi In Depth
https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html?es_p=7695258
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us | |||
23 Oct 2018 | Episode 111 – How Public Cloud changed Big Data | 00:51:08 | |
No interview this time but just Dave and Jhon talking about how public cloud changed Big data. Current news has brought this topic back to the foreground and we though it was a good idea to give our views on this subject.
Along the way, we go over the different deployment strategies for Hadoop across on premise, private and public cloud and of course, hybrid environments.
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
30 Oct 2018 | Episode 112 – Roaring News | 00:26:37 | |
In this last Big Data news episode for the month of November, we look forward to the H2O World event next week in London and we have articles on BI Maturity and the upcoming Apache Ozone project that will supplant HDFS in future Hadoop clusters soon(TM).
BI Maturity: You can’t get there from here!
http://makingdatameaningful.com/bi-maturity/
Introducing Apache Hadoop Ozone: An Object Store for Apache Hadoop
https://hortonworks.com/blog/introducing-apache-hadoop-ozone-object-store-apache-hadoop/
Katacoda example down on this page
https://hadoop.apache.org/ozone
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
06 Nov 2018 | Episode 113 – H2OAIWorld London 2018 Roaring Report | 01:02:13 | |
Here is our H2O.ai World conference London Roaring Report. We had a blast and we hope that this episode can give you a good taste of what was going on.
The sessions are now available online: https://www.youtube.com/playlist?list=PLNtMya54qvOHh9LaA08hkusynWVStNEhm
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
13 Nov 2018 | Episode 114 – Roaring News | 00:26:48 | |
In this serving of bite-sized Big Data News we talk about the IBM takeover of Red Hat, a new Botnet going for unprotected Hadoop nodes and a somewhat disappointing Cloudera blog post.
IBM To Acquire Red Hat
https://investors.redhat.com/news-and-events/press-releases/2018/10-28-2018-184027500
https://newsroom.ibm.com/2018-10-28-IBM-To-Acquire-Red-Hat-Completely-Changing-The-Cloud-Landscape-And-Becoming-Worlds-1-Hybrid-Cloud-Provider
New DDoS botnet goes after Hadoop enterprise servers
https://www.zdnet.com/article/new-ddos-botnet-goes-after-hadoop-enterprise-servers/
(remember Dr.Who ? https://medium.com/@neerajsabharwal/hadoop-yarn-hack-9a72cc1328b6 )
New in Cloudera Enterprise 6: Apache Hive 2.1 (By the Cloudera Hive Team)
http://blog.cloudera.com/blog/2018/10/new-in-cloudera-enterprise-6-apache-hive-2-1/
https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_601_unsupported_features.html#hive_c6_unsupported_features
https://hive.apache.org/downloads.html
https://issues.apache.org/jira/browse/HIVE-17129
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
20 Nov 2018 | Episode 115 – Anniversary three: I guess we’re in it for the long run now! | 00:59:40 | |
It's been three years since we started this podcast and as we've done in previous years, we invited the wonderful people that were a guest on our show in the past twelve months and made our little podcast so much better for our listeners!
Our thanks to our guests that celebrated our three year anniversary with us:
Ward Bekker (Linkedin)
Pre-Sales Solutions Engineer II at Hortonworks
Talking about Apache Metron
Rohit Jain (linkedin)
Chief Technology Officer at Esgyn
Talking about Esgyn, Trafodion and cloud vs on-premise vs hybrid.
Sanjeev Kulkarni (Linkedin)
Co-Founder at Streamlio
Talking about Apache Pulsar
Phillip Radley (Linkedin)
Chief Data Architect at BT
Talking about future predictions made years ago
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
27 Nov 2018 | Episode 116 – Roaring News | 00:27:09 | |
This Machine Learning heavy edition of Big Data News, covers Boston School Bus schedules and Model interpretation using LIME. As a bonus, we have a great source of Nifi knowledge for you!
What the Boston School Bus Schedule can Teach US About AI
https://www.wired.com/story/joi-ito-ai-and-bus-routes/
Understanding model predictions with LIME
https://towardsdatascience.com/understanding-model-predictions-with-lime-a582fdff3a3b
Introduction to Local Interpretable Model-Agnostic Explanations (LIME)
https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime
Locally Interpretable Models and Effects based on Supervised Partitioning (LIME-SUP)
https://arxiv.org/abs/1806.00663
Best of NiFi
https://pierrevillard.com/best-of-nifi/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
04 Dec 2018 | Episode 117 – Big Data Disaster Recovery | 00:53:21 | |
When Big data projects mature from R&D projects to business critical components, it becomes important to look at how your environment can survive and recover from catastrophic failures.
Considering the not unimportant cost of a good Disaster Recovery plan, it is good to take a good look at your deployment and carefully weigh the good and bad on a granular level.
Here is the link to the slideshare presentation by Carlos Izquierdo at Big Data Spain 2017: Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
11 Dec 2018 | Episode 118 – Roaring News | 00:32:31 | |
In this Big Data News episode, we use an article on how some disgruntled open source projects tried to force the "net giants" to give back as an excuse to talk about open source ethics. The second article for today comes from the hand of Noel Sharkey about possible deception in modern robotics.
Time for Net Giants to Pay Fairly for the Open Source on Which They Depend
https://www.linuxjournal.com/content/time-net-giants-pay-fairly-open-source-which-they-depend
Mama Mia It's Sophia: A Show Robot Or Dangerous Platform To Mislead?
https://www.forbes.com/sites/noelsharkey/2018/11/17/mama-mia-its-sophia-a-show-robot-or-dangerous-platform-to-mislead
Artificial Intelligence: A Modern Approach (Third edition) by Stuart Russell and Peter Norvig
http://aima.cs.berkeley.edu/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
18 Dec 2018 | Episode 119 – Knowage: The Open Source Business Analytics Suite | 00:48:43 | |
This time we are joined by Paolo from Knowage who gives us a high level overview of Knowage: a totally open source suite for Business Analytics.
The Knowage suite is composed of several modules, each one conceived for a specific analytical domain. They can be used individually or combined with one another to ensure full coverage of user’ requirements, allowing to build a tailored product.
Thank you to our guest:
Paolo Raineri
Business Developer (linkedin)
https://www.knowage-suite.com
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
25 Dec 2018 | Episode 120 – Roaring News | 00:39:36 | |
Merry Big Data News Christmas!
Since it's the 25th of December, we're investigating how Big Data is changing the operations at the North Pole using a couple of blog posts from Splunk.
Christmas 2020. Will big data and IOT change things for Father Christmas? Part I
https://www.splunk.com/blog/2014/12/17/christmas-2020-part1.html
Christmas 2020. Will big data and IOT change things for Father Christmas? Part II
https://www.splunk.com/blog/2014/12/18/christmas-2020-part2.html
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
01 Jan 2019 | Episode 121 – Infrastructure and Data Lifecycle (part 1) | 00:42:53 | |
Does the standard Dev-Test-Prod cycle make sense in a Big Data environment or should you approach this subject a little differently?
In this episode, we sum up our experiences and best practice tips regarding the infrastructure part and Data Lifecycle will be features in the next topic episode.
Planning on attending the Melbourne @DataWorksSummit? Send email to DWS18APAC@roaringelephant.org for a free ticket to the Melbourne event in February! Big thanks to @DataWorksSummit & @hortonworks for sponsoring this giveaway!
Dataworks Summit Barcelona is also rapidly approaching. You can find my dynamic sessions statistics dashboard here: https://aka.ms/DWS2019BA
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
08 Jan 2019 | Episode 122 – Roaring news | 00:32:38 | |
In this first Big Data News episode of 2019, we cover how A.I. will nudge you to a happier (work)life, the new Hive Data Warehouse connector. We end the episode with unstable artificial intelligence and how you can make a chance on a one million Euro prize!
Can an AI keep you happy at work? Ex-Google team reveal software that 'nudges' workers with messages throughout the day
https://www.dailymail.co.uk/sciencetech/article-6545051/The-AI-happy-work-Ex-Google-team-reveal-software-nudges-workers.html
https://humu.com/
Apache Hive Warehouse Connector Use-Cases
https://hortonworks.com/blog/hive-warehouse-connector-use-cases/
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html
http://www.russellspitzer.com/2017/05/19/Spark-Sql-Thriftserver/
In January, the EU starts running Bug Bounties on Free and Open Source Software
https://juliareda.eu/2018/12/eu-fossa-bug-bounties/
AI has a probability problem
https://go.forrester.com/blogs/artificial-intelligence-has-a-probability-problem/
Apache Kafka 58.000,00 € 07/01/2019 15/08/2019 HackerOne
https://www.zdnet.com/article/eu-to-fund-bug-bounty-programs-for-14-open-source-projects-starting-january-2019/
https://juliareda.eu/2016/07/eu-audits-keepass-apache/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
15 Jan 2019 | Episode 123 – Infrastructure and Data Lifecycle (part 2) | 00:57:24 | |
In episode 121 we discussed the first part of this story and now we conclude with a discussion of the data life-cycle considerations that apply to a Big Data and Advanced Analytics environment.
The primary inspiration for this episode:
The Big Data Lifecycle explained
https://www.pinkelephantasia.com/big-data-lifecycle/
Additional Inspiration:
7 phases of a data life cycle
https://www.bloomberg.com/professional/blog/7-phases-of-a-data-life-cycle/
Thinking Beyond Traditional Data Life Cycle Management
https://hortonworks.com/article/thinking-beyond-traditional-data-life-cycle-management/
Understanding the Big Data Life-Cycle
https://www.linkedin.com/pulse/four-keys-big-data-life-cycle-kurt-cagle/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
22 Jan 2019 | Episode 124 – Roaring News | 00:38:12 | |
The Hortonworks -Cloudera merger has been finalized and the new CDP (Cloudera Data Platform) has been announced. We also talk about data mining bias, the good and bad of Hackathons and end on a rant about data sizes.
Cloudera Unveils CDP, Talks Up ‘Enterprise Data Cloud’
https://www.datanami.com/2019/01/10/cloudera-unveils-cdp-talks-up-enterprise-data-cloud/?_lrsc=718d30ff-51ed-40c5-bba9-750a82009aaf
Cloudera and Hortonworks' merger closes; quo vadis Big Data?
https://www.zdnet.com/article/cloudera-and-hortonworks-merger-closes-quo-vadis-big-data/
Welcome to a brand-new Cloudera
https://hortonworks.com/blog/welcome-brand-new-cloudera/
The Exaggerated Promise of So-Called Unbiased Data Mining
https://www.wired.com/story/the-exaggerated-promise-of-data-mining/
On Hackathons : Lessons Learned, Experience, Advice
https://www.knoyd.com/blog/2019/1/10/on-hackathons-lessons-learned-experience-advice
Big Insights Not Big Data: Why We Should Stop Talking About File Size
https://www.forbes.com/sites/kalevleetaru/2019/01/09/big-insights-not-big-data-why-we-should-stop-talking-about-file-size
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
29 Jan 2019 | Episode 125 – Sparkling Water with H2O.AI (Part 1) | 00:51:21 | |
We recently sat down with Kuba and Pavel from H2O to discuss how you can easily lift your Spark notebooks to the next level by adding some H20 to it using their open source Sparkling Water project.
In this first part of the interview, we cover the conceptual principles behind Sparkling water and discuss some existing use case implementations.
Jakub "Kuba" Hava
Senior Software Engineer at H2O.ai
Pavel Pscheidl
Machine learning engineer at H2O.ai, Software engineer, Writer
H2O World San Fransisco
Find out more at the upcoming H2O World conference in San Fransisco on February 4-5, 2019
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
12 Feb 2019 | Episode 127 – Sparkling Water with H2O.AI (part 2) | 00:40:10 | |
We recently sat down with Kuba and Pavel from H2O to discuss how you can easily lift your Spark notebooks to the next level by adding some H20 to it using their open source Sparkling Water project.
In this second part of the interview, we go deeper into the technical details of Sparking Water and how you can deploy and use it in your environment. We end the conversation with a look at the roadmap and anything else the future may bring.
Jakub "Kuba" Hava
Senior Software Engineer at H2O.ai
Pavel Pscheidl
Machine learning engineer at H2O.ai, Software engineer, Writer
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
05 Feb 2019 | Episode 126 – Roaring News | 00:26:26 | |
The second news episode for 2019 is almost entirely devoted to practical AI with some tutorial notebooks and finding a parking space. We end this show with dire warnings of the impending Big Data induced Apocalypse!
Practical AI Workshop
https://blog.revolutionanalytics.com/2019/01/notebooks-from-the-practical-ai-workshop.html
Snagging Parking Spaces with Mask R-CNN and Python
https://medium.com/@ageitgey/snagging-parking-spaces-with-mask-r-cnn-and-python-955f2231c400
Head of Russian Orthodox Church Warns Big Data Will Usher in the Antichrist
https://gizmodo.com/head-of-russian-orthodox-church-warns-big-data-will-ush-1831598967
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
19 Feb 2019 | Episode 128 – Roaring News | 00:59:15 | |
In this Deep learning heavy edition of Big Data News, we have articles about how to get into the Data Scientist life, how and where to get the skills and how you eventually may end up beating pro-gamers at their thing.
[powerpress
The DataWorks Summit Barcelona is coming up soon and we have a free entry ticket to raffle off to a lucky Big Data Winner!
Send an email to DWS19BARCELONA at roaringelephant.org to enter the raffle!
What’s Driving Data Science Hiring in 2019
https://www.datanami.com/2019/01/30/whats-driving-data-science-hiring-in-2019/
Practical Deep Learning for Coders 2019
https://www.fast.ai/2019/01/24/course-v3/
https://course.fast.ai/
Deep Learning vs Classical Machine Learning
https://towardsdatascience.com/deep-learning-vs-classical-machine-learning-9a42c6d48aa
Top Machine Learning Algorithms for Predictions. A Short Overview.
https://www.aisoma.de/top-machine-learning-algorithms-for-predictions-a-short-overview/
An AI crushed two human pros at Starcraft but it wasn’t a fair fight
https://arstechnica.com/gaming/2019/01/an-ai-crushed-two-human-pros-at-starcraft-but-it-wasnt-a-fair-fight/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
26 Feb 2019 | Episode 129 – DataWorks Summit Barcelona Track Chair Interviews | 00:43:33 | |
In this episode we have interviews with Niels Basjes and Aljoscha Krettek, respectively track chairs for Big Compute & Storage and Internet of Things. We talk with them about what being a track lead means, the sessions in their tracks and of course about what they are doing themselves with Big Data and Advanced Analytics.
Niels Basjes
Lead IT-Architect Scalable Solutions at Bol.com
Bol.com Techlabs:
https://techlab.bol.com/
https://techlab.bol.com/author/nbasjes/
Bol.com on Youtube:
https://www.youtube.com/results?search_query=bol.com+berlinbuzzwords
Bol.com is looking for you!
https://careers.bol.com/
Aljoscha Krettek
Co-Founder, Software Engineer at Data Artisans
Data Artisans / Ververica Blogs:
https://www.ververica.com/blog
Join a world-class team at Ververica:
https://www.ververica.com/careers
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
05 Mar 2019 | Episode 130 – Roaring News | 00:35:06 | |
In this episode of Bite Sized Big Data news, we cover the merging of Data Artisans and Alibaba forming the new Ververica entity, AI related challenges and a BBC cook book for visualizations in R.
Dave had some issues recording his side, our apologies for the rather bad quality of Dave's audio track on this episode.
Data Artisans, who was recently purchased by Alibaba, have renamed to Ververica.
https://www.ververica.com/blog/introducing-our-new-name
https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions
The challenges to tackle before you start with AI
http://www.ronaldvanloon.com/the-challenges-to-tackle-before-you-start-with-ai/
Create data visualisations like BBC news with the BBC R Cook Book
https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and-data-journalism-team-works-with-graphics-in-r-ed0b35693535
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
12 Mar 2019 | Episode 131 – Dataworks Summit 2019 Barcelona Session Preview | 00:45:55 | |
With the Dataworks summit in Barcelona comming up next week, we take a look at the agenda with the available sessions and take you through our best picks and honorable mentions.
Session statistics dashboards:
Dataworks Summit 2019 in Barcelona: https://aka.ms/DWS2019BA
Dataworks Summit 2018 Berlin: https://aka.ms/DWS2018
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
21 Mar 2019 | Episode 132 – Roaring DataWorks Summit Barcelona, ft. John Mertic | 01:18:55 | |
Dataworks Summit 2019 Barcelona has come and gone... Recording live from my hotel room, we give our view on the highs and lows of the event and talk about the things we learned.
This episode also include a short interview with John Mertic from the Linux Foundation who talked to us about Data Governance and ODPi Egeria.
John Mertic
Director of Program Management for ODPi, R Consortium, and Open Mainframe Project
https://www.linkedin.com/in/jmertic/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
26 Mar 2019 | Episode 133 – Big Data in Cybersecurity with Saad Ayad, featuring Apache Metron (Part 1) | 00:32:35 | |
DataLeaks and the resulting attack on our privacy have been a major news item in the recent months. Big data tools like Apache Metron, built on top of Hadoop can be instrumental in detecting and preventing intrusions.
In this episode, we are joined by Saad Ayad who was General Manager Security Operations at Telstra and currently is a Director at Digital Fortress Services in Melbourne Australia. Saad has been active in the cybersecurity world for a long time and we are grateful he was willing to spend some time with us and share his knowledge and experience.
[Digital Fortress Services - Cybersecurity]
Saad Ayad (@saadayad_)
Cyber Security, Big Data Analytics & Operations
http://www.digitalfortress.services
@DigFortServ
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
09 Apr 2019 | Episode 135 – Big Data in Cybersecurity with Saad Ayad, featuring Apache Metron (Part 2) | 00:30:05 | |
DataLeaks and the resulting attack on our privacy have been a major news item in the recent months. Big data tools like Apache Metron, built on top of Hadoop can be instrumental in detecting and preventing intrusions.
In this episode, we are joined by Saad Ayad who was General Manager Security Operations at Telstra and currently is a Director at Digital Fortress Services in Melbourne Australia. Saad has been active in the cybersecurity world for a long time and we are grateful he was willing to spend some time with us and share his knowledge and experience.
[Digital Fortress Services - Cybersecurity]
Saad Ayad (@saadayad_)
Cyber Security, Big Data Analytics & Operations
http://www.digitalfortress.services
@DigFortServ
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
02 Apr 2019 | Episode 134 – Roaring News: Dataworks Summit Lightning Interviews | 00:37:19 | |
A special edition of Big Data News featuring a number of quick interviews at the booths in the community expo hall.
A big thank you to the brave people there that were willing to face the Roving Roaring Mike at the Barcelona Dataworks summit a couple, of weeks ago.
03:04 Attunity
https://www.attunity.com/
07:41 Cloudera Fast Forward Labs
https://www.cloudera.com/products/fast-forward-labs-research.html
11:09 DataVard
https://www.datavard.com
17:19 Cazena
https://www.cazena.com/
22:39 Syncsort
https://www.syncsort.com
26:22 Accenture
https://www.accenture.com
30:44 Unravel Data
https://unraveldata.com
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
16 Apr 2019 | Episode 136 – Temet Nosce | 00:31:04 | |
Breaking with tradition, this News Episode does not have any Big data related articles. Instead, this episode is all about our plans for the future of this podcast...
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
23 Apr 2019 | Episode 137 – Interview on DataOps with Chris Bergh of DataKitchen.io (Part 1) | 00:45:06 | |
DataKitchen.io's Chris Bergh takes us down the path towards successful DataOps implementation.
If you have not heard of the DataOps concept yet and data is a big part of your environment (and really, it should be) we're sure you will find more than a couple takeaways here!
Christopher Bergh (@ChrisBergh)
CEO & Head Chef, DataKitchen
The DataOps Cookbook
DataOps is NOT Just DevOps for Data
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
07 May 2019 | Episode 139 – Interview on DataOps with Chris Bergh of DataKitchen.io (Part 2) | 00:33:41 | |
DataKitchen.io's Chris Bergh takes us down the path towards successful DataOps implementation.
If you have not heard of the DataOps concept yet and data is a big part of your environment (and really, it should be) we're sure you will find more than a couple takeaways here!
Christopher Bergh (@ChrisBergh)
CEO & Head Chef, DataKitchen
The DataOps Cookbook
DataOps is NOT Just DevOps for Data
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
30 Apr 2019 | Episode 138 – Roaring News | 00:27:50 | |
The biggest news is of course the launch of our Patreon! Hop over to https://www.patreon.com/roaringelephant and see if you want to help us thrive and grow! On the technical front, we have a Blog on Machine Learning Model Management, Apache turning 20 and Google breeding aggressive A.I.! And we also have a side-conversation on NginX...
Apache Software Foundation Continues to Grow Open Source Software
https://www.eweek.com/development/the-apache-software-foundation-continues-to-grow-open-source-software
Frameworks for Machine Learning Model Management
https://www.inovex.de/blog/machine-learning-model-management/
Google's AI Has Learned to Become "Highly Aggressive" in Stressful Situations
https://www.sciencealert.com/google-deep-mind-has-learned-to-become-highly-aggressive-in-stressful-situations
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
14 May 2019 | Episode 140 – Roaring News | 00:36:58 | |
Another week another feed of roaring news articles starting with apparent changes at MapR and the release of Red Hat Enterprise Linux 8. We go in depth on the open sourcing of the DataBricks developed Delta Lake and finish with some SQL generated fractals.
Big thanks to our Roaring Patreons making this podcast possible!
DataWorks Summit free ticket raffle.
Final week for our DataWorksSummit Washington DC free ticket giveaway!
Get your free ticket now!
The Roaring Elephant on YouTube.
The Roaring Elephant YouTube channel has launched!
Will you help us reach 100 subscribers (modest goals are a good start!) so we can claim our personalized URL on YouTube?
Every time a new episode is published, you will find a video uploaded to the channel as well. There won't be any real video yet though, only a still image as you can see in the thumbnails.
But as soon as we reach the related goal on our Patreon, this is where our video content will appear.
In case you are wondering, when we start recording actual video's, the regular mp3's on the podcast feed will remain exactly as they are now. So if you prefer not to look at our mugs while enjoying the podcast, that should remain possible.
Interactive DWS-DC session dashboard
https://aka.ms/DWS2019DC
As I've been doing for a while now, I've again launched a session statistics dashboard for this event. It can be found at https://aka.ms/DWS2019DC and as usual, this PowerBi dashboard is interactive. simply click on the different elements to filter or drill down.
There's only 58 sessions listed at the moment. I will be updating it from time to time so keep an eye out for some tweets from @jhonmasschelein if you want to get notified!
R.I.P. MapR?
https://www.linkedin.com/feed/update/urn:li:activity:6532418505361416192
https://www.linkedin.com/feed/update/urn:li:activity:6532352941800595456
Our first bit of news is more of a rumor for now: we were pointed towards some messages on LinkedIn that seem to indicate some reorganising is happening there:
We will be following how this develops in the next few weeks.
Best of luck to anyone who is affected!
RHEL version 8 is out!
Red Hat Opens the Linux Experience to Every Enterprise, Every Cloud and Every Workload with Red Hat Enterprise Linux 8
It's been a while coming but even though RHEL 7 is still around for a few years, Red Hat has released the next version of their popular Linus distro.Notwithstanding Dave's horror at the new logo, we're very exited about this and personally, I am eagerly awaiting the Centos 8 release that should appear in a couple of months
Delta lake Open-Sourced.
Open Sourcing Delta Lake
Databricks claims its new product Delta is the missing link to enterprise AI
A press release from the good folks at DataBricks informs the world that their proprietary data lake storage layer called "Delta Lake" has now been open sourced.
Delta Lake was released by DataBricks at the end of 2017 and was only available on their managed Service offerings in the public clouds, but now anyone can download and deploy.
However, all is not well: we're having some serious issues with the content of the press release and quite frankly, we're scratching our heads to find exactly what problem Delta Lake is trying to solve and if it actually does that...
Fractals, SQL-Style!
Generating Fractals with Postgres: Escape-Time Fractals
Just to make Dave happy, we finish this episode off with some great fractal visualizations made with SQL.
Euch...
What?
Yes, SQL. That's right!
Click the link to see how the apparently Turing Complete SQL is able to do that.
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
28 May 2019 | Episode 142 – Roaring News – KubeCon 2019 Report | 00:47:07 | |
A little over a week ago, KubeCon and CloudNativeCon happened and our independent Roaring Roving Reporter Rubik Dave came back from Barcelona with a comprehensive report.
Kubernetes
As the kubernetes.io webpage tells us: "Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications."
As we discuss in the episode, Kubernetes forms a kind of middleware layer that performs orchestration of light weight docker containers. To be sure, you can use other container technologies but Docker (and its companion project Moby) are what is most often used with Kubernetes.
The biggest advantage of Kubernetes, I believe, is how it has standardized the way a micro services framework based on docker container instances can be deployed and managed. There have been a myriad of other approaches that tried to solve that problem (and Dave gives a rather exhaustive list in the episode), Kubernetes has emerged to be the best supported by the community.
KubeCon
And that is where KubeCon comes in: there are other, more developer oriented conferences, but KubeCon is perhaps the largest event for Kubernetes consumers. Details on this years event are available at the KubeCon | CloudNativeCon Europe 2019 website.
If you missed this years installment, take a note that next years Europe event will be in Amsterdam, March 30th to April 2nd. And if the American continent is more practical, you can join the community at the San Diego venue, November 18th to 21st.
CloudNativeCon
KubeCon ran together with the CloudNativeCon for as long as I can figure out and since Kubernetes is one of the larger "CNCF graduated" projects, that is not surprising.
It also makes sense since micro services architectures are an excellent fit for cloud based deployments so a lot of the Kubernetes community is likely to also be a member of the "cloud crowd".
Now, reading the CloudNative website, their charter in particular, it does seems to see it's purpose in a similar vein as the Apache Foundation does. However, the CloudNative folk recommend the projects under it's wings to use the Apache 2.0 license so they certainly don't appear to be in any kind of direct competition here... I think I feel a future podcast episode announcing itself! :D
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
21 May 2019 | Episode 141 – Spark in Action with author Jean-Georges Perrin (Part 1) | 00:48:02 | |
And now for something completely different: a book review! Not something we have done before, but when Jean-Georges Perrin contacted us with the suggestion of taking a deeper look at the "Spark in Action" book he is currently writing, we certainly did not say no! However, in al honesty, we talked about much, much more...
Free eBook raffle
Manning Publication has been kind enough to give us a couple of download codes for a free eBook version of "Spark in Action".
As always, our Patreons get a first chance to get their hands on one of the codes. If you are a Roaring V.I.P. (or higher), you can head over to our Patreon Page now where you will find a posts containing all the information required. If you become a Patreon now, you immediately get access tot that post! ;)
After one week, if there are any codes left, there will be a tweet about what you can do to get a free code, even if you are not a Patreon.
A book review on Spark in Action, second edition with author Jean-Georges Perrin
In this first part of the interview, we meet the author and talk about Apache Spark and Open Source in general. We also cover the MEAP system used by Manning Publication to get books like these in the hands of the readers as soon as possible while allowing early readers to help shape the book.
Our thanks to Jean-Georges for spending quite a bit of time with us talking about Apache Spark and to Manning Publication for the free eBook codes!
Find out more about Jean-Georges at his blog: https://jgp.net/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. | |||
04 Jun 2019 | Episode 143 – Spark in Action with author Jean-Georges Perrin (Part 2) | 00:58:56 | |
And now for something completely different: a book review! Not something we have done before, but when Jean-Georges Perrin contacted us with the suggestion of taking a deeper look at the "Spark in Action" book he is currently writing, we certainly did not say no! However, in al honesty, we talked about much, much more...
Free eBook raffle
Manning Publication has been kind enough to give us a couple of download codes for a free eBook version of "Spark in Action".
As always, our Patreons get a first chance to get their hands on one of the codes. If you are a Roaring V.I.P. (or higher), you can head over to our Patreon Page now where you will find a posts containing all the information required. If you become a Patreon now, you immediately get access tot that post! ;)
After one week, if there are any codes left, there will be a tweet about what you can do to get a free code, even if you are not a Patreon.
A book review on Spark in Action, second edition with author Jean-Georges Perrin
In the second part we go deeper into the book, going over the available chapters and appendices. We cover a number of topics and concepts like the layout of a typical data lake, the four pillars of Apache Spark and more. We end the interview with a discussion on what it's like to write a technical book like Spark in Action.
Our thanks to Jean-Georges for spending quite a bit of time with us talking about Apache Spark and to Manning Publication for the free eBook codes!
Find out more about Jean-Georges at his blog: https://jgp.net/
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover. |
Améliorez votre compréhension de Roaring Elephant avec My Podcast Data
Chez My Podcast Data, nous nous efforçons de fournir des analyses approfondies et basées sur des données tangibles. Que vous soyez auditeur passionné, créateur de podcast ou un annonceur, les statistiques et analyses détaillées que nous proposons peuvent vous aider à mieux comprendre les performances et les tendances de Roaring Elephant. De la fréquence des épisodes aux liens partagés en passant par la santé des flux RSS, notre objectif est de vous fournir les connaissances dont vous avez besoin pour vous tenir à jour. Explorez plus d'émissions et découvrez les données qui font avancer l'industrie du podcast.
© My Podcast Data