Beta

Explorez tous les épisodes de Roaring Elephant

Plongez dans la liste complète des épisodes de Roaring Elephant. Chaque épisode est catalogué accompagné de descriptions détaillées, ce qui facilite la recherche et l'exploration de sujets spécifiques. Suivez tous les épisodes de votre podcast préféré et ne manquez aucun contenu pertinent.

Rows per page:

1–50 of 447

DateTitreDurée
18 Nov 2015Episode 1 – A new beginning: Getting started in Hadoop00:36:06
With all the buzz around big data generally, and Hadoop specifically, there's never been a better time for getting started in Hadoop. This episode covers how your two hosts got involved in Hadoop, and also discusses some of the other popular paths into the world of BigData/Hadoop 00:00 Recent events How did your hosts get into Hadoop 04:30 main Topic Driven by individuals vs organisations Online education options Formal training 19:20 With Questions from our Listeners: Isn’t it really difficult? Do you need to know Java? Do you need to know SQL? Will I need to throw everything else in my datacentre out? Can I replace my EDW (Enterprise Data Warehouse)? Do I have to re-write all my ETL (Extract-Transform-Load)? 36:05 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
01 Dec 2015Episode 2 – How to avoid disaster00:43:37
When you are getting started with your journey with Hadoop, how to avoid Hadoop disaster? We have seen many people going through this journey and both of us have seen things people do that makes the project successful, and things people do that make projects more difficult than they should be. 00:00 Recent events Customer pilot completion SQL on Hadoop Masterclasses Multi-tenant Spark notebook issues Spark recommendation engine webinar 11:00 Main Topic Starting too small Baseline and benchmark Config management Backup and/or disaster recovery Leaving security too late 36:00 Questions from our Listeners: Where do I find data scientists? Storage options? Install everything? 43:37 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
15 Dec 2015Episode 3 – High level Hadoop architectures00:37:54
What are the hardware and implementation options we see.A discussion ranging from direct attached storage versus network attached storage/storage area networks, to on-premise hardware versus cloud options. 00:00 Recent events Organisations starting their Big Data Journey A lessons learned workshop for a customer after their successful pilot Planning Masterclasses for 2016 Migration customer workshop Big Data and the Connected Car webinar (registration required) 07:30 Main Topic Direct attached storage (DAS) or “traditional” hadoop Network attached storage (NAS) / Storage Area Networks (SAN) Cloud / Azure / AWS / Google Cloud / Openstack etc... SaaS/PaaS/HaaS/HDInsight Ceph & Gluster ObjectStore(S3) and Other cloud storages 25:30 Questions from our Listeners: Doesn’t having a SAN/NAS system break data locality? Can I mix drive sizes and types within a cluster or even within the same node? Hybrid cluster environments, how to mix cloud and on premise deployment? Can I dedicate certain nodes to certain workloads? 37:54 End   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
29 Dec 2015Episode 4 – Hadoop: Year in review00:38:35
A bit of Hadoop history of what we have seen happening over the last 12 months, some trends and interesting technologies. Some ups, some downs and possibly even some round and rounds, capped off with some Bold Predictions for 2016. 00:00 Recent events A number of engagements Apache Nifi Why some Hadoop users decide to go for separate clusters per use case or (internal) client 06:00 Main Topic A broad acceptance of Hadoop in Europe A shift from batch workload to multi-tenant, secure platform including IoT and Real time, in memory analytic. Apache Ambari making our life easier all the time Data Governance Initiative Open Data Initiative (http://odpi.org) Public clouds offer Big Data specific environment Tech advances in Hive (CBO/ORC/Zlib) and Transparent Encryption in HDFS Apache NiFi The year of Apache "open community" open source Bold Predictions! 31:00 Questions from our Listeners: What new (incubating) projects should I invest time in today, knowing that they may never be included in any distribution? I’ve been looking into Apache NiFi and am curious whether or not I can use it to replace Apache Flume? Should I go for a Hadoop appliance or not? 38:35 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
30 Aug 2016Episode 23 – Security in Hadoop – Authentication01:07:49
In this episode, we discuss this fortnight's interesting big data news that caught our eye and then go on to discuss the basics around authentication in Hadoop for what is the first in a series of episodes that we'll be doing over the next few months on the broad topic of security. 00:00 Recent events Dave: The new science behind customer loyalty http://insights.principa.co.za/the-new-science-behind-customer-loyalty http://insights.principa.co.za/infographic-creating-a-data-driven-customer-loyalty-strategy 5 great charts in 5 lines of R code http://blog.revolutionanalytics.com/2016/08/five-great-charts-in-5-lines-of-r-code-each.html Using big data to create value for customers, not just target them https://hbr.org/2016/08/use-big-data-to-create-value-for-customers-not-just-target-them Jhon: Linux turns 25 (25 August 1991 ) https://www.linux.com/news/linus-torvalds-reflects-25-years-linux http://web.archive.org/web/20100104211620/http://www.linux.org/people/linus_post.html Hadoop 2.7.3 a minor release in the 2.x.y release line, building upon the previous stable release 2.7.2 http://hadoop.apache.org/docs/r2.7.3/ Specification work related to the Hadoop Compatible Filesystem (HCFS) effort. Hadoop in the cloud/as a service getting a lot of attention lately http://hortonworks.com/blog/making-elephant-fly-cloud/ http://blog.cloudera.com/blog/2016/08/analytics-and-bi-on-amazon-s3-with-apache-impala-incubating/ https://vision.cloudera.com/analytic_database_in_cloud/ http://venturebeat.com/2016/08/25/sap-altiscale/ Facebook open sources image-recognition AI with live video in mind https://research.facebook.com/blog/learning-to-segment/ NoSQL Databases: a Survey and Decision Guidance https://medium.baqend.com/nosql-databases-a-survey-and-decision-guidance-ea7823a822d#.c037d5jbj Committer criteria from Apache https://hadoop.apache.org/committer_criteria.html  Maybe they should just have referred to our podcast! :) Episode 11 - Interview with Community Award Winner Venkatesh Sellappa 40:20 Security in Hadoop - Authentication What is Authentication? Why is it important? When should I do it? Hadoop is insecure by default without strong Authentication Kerberos Active Directory, MIT Kerberos and FreeIPA 01:07:49 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
13 Sep 2016Episode 24 – Hadoop Summit Melbourne 2016 Preview01:07:33
With Hadoop Summit Melbourne 2016 starting the day after we are recording this episode, we go over the published agenda and discuss the current state of the Big Data Technology ecosystem while we pick our favorite sessions. Wish we were there! 00:00 Recent events Dave Cloud Security Alliance release cloud and big data security guidelines http://siliconangle.com/blog/2016/08/28/the-cloud-security-alliance-publishes-its-best-practices-for-big-data-security/ https://cloudsecurityalliance.org/download/big-data-security-and- privacy-handbook/ Common Big Data Backup and Recovery myths http://www.networkworld.com/article/3113036/big-data-business-intelligence/debunking-the-most-common-big-data-backup-and-recovery-myths.html Big Data, Google, and the end of free will http://www.ft.com/cms/s/2/50bb4830-6a4c-11e6-ae5b-a7cc5dd5a28c.html Jhon SuperComputing now going to hadoop style systems https://techcrunch.com/2016/05/24/crays-latest-supercomputer-runs-openstack-and-open-source-big-data-tools/ The Home for Data Science https://www.kaggle.com/ 36:10 Hadoop Summit Melbourne 2016 Preview 01:07:33 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
27 Sep 2016Episode 25 – The pro’s and con’s of crafting your own distribution01:34:59
When we talk about Big Data and Hadoop in particular, we generally have one of the existing distributions from Cloudera, Hortonworks or other Big Data companies in mind. But sometimes, a pre-built distro just does not meet the needs. In this episode, we have a guest on the show that explains why they made the choice to forgo the available distributions in favour of building ones own. http://lod-cloud.net/ 00:00 Recent events Dave: Which tool should I use? http://brohrer.github.io/which_tool_should_i_use.html YaRrr! - The Pirate’s guide to R Blog: http://nathanieldphillips.com/thepiratesguidetor/ YaRrr! - Download the book: https://drive.google.com/file/d/0B4udF24Yxab0S1hnZlBBTmgzM3M/view Video tutorials to go with the above: https://www.youtube.com/playlist?list=PL9tt3I41HFS9gmeZFEuNrnu_7V_NFngfJ Listener Question from Sampath from Baltimore: When moving into a career in Big Data, is it better to pick a technology like Spark and try to build expertise on it versus having a broader knowledge on many tools. I registered for Edx courses and working towards getting Cloudera Certification. Please provide me any advice. Jhon: More accountability for big-data algorithms http://www.nature.com/news/more-accountability-for-big-data-algorithms-1.20653 The "doomsday" version: http://time.com/4471451/cathy-oneil-math-destruction/ 6 Illusions Execs Have About Big Data https://www.entrepreneur.com/article/281809 Michele: Hadoop release 3.0.0-alpha1 available http://hadoop.apache.org/releases.html#03+September%2C+2016%3A+Release+3.0.0-alpha1+available Running Spark on Alluxio with S3 https://www.oreilly.com/learning/running-spark-on-alluxio-with-s3 47:00 The pro's and con's of crafting your own distribution With our special guest Michele Lamarca (@nonfacciocip). Many thanks to Michele for being on the podcast with us and sharing his experiences! 01:34:59 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
11 Oct 2016Episode 26 – Security 2: Authorisation and audit01:10:32
In this episode, we continue our coverage on Hadoop security. Where episode 24 dealt with the subject of authentication, we now delve deeper in the why and how of authorization and audit, and cover the  major players in the arena.     00:00 Recent events Dave Beyond Privacy and Security in a Connected World http://www.svds.com/beyond-privacy-security-connected-world/ The broken promise of open-source Big Data software – and what might fix it http://siliconangle.com/blog/2016/09/27/the-broken-promise-of-open-source-big-data-software-and-what-might-fix-it-2/ Meet Apache Spot, a new open source project for cybersecurity http://www.csoonline.com/article/3124497/big-data/meet-apache-spot-a-new-open-source-project-for-cybersecurity.html SMEs advised to capitalise on ‘big data’ http://www.farminglife.com/news/farming-news/smes-advised-to-capitalise-on-big-data-1-7606523 Jhon What is hardcore data science—in practice? https://www.oreilly.com/ideas/what-is-hardcore-data-science-in-practice Hortonworks, IBM Collaborate to Offer Open Source Distribution on Power Systems http://www.prnewswire.com/news-releases/hortonworks-ibm-collaborate-to-offer-open-source-distribution-on-power-systems-300330299.html https://www-03.ibm.com/press/us/en/pressrelease/50553.wss Inside 'The Next Rembrandt': How JWT Got a Computer to Paint Like the Old Master The project leaders explain their brilliant, troubling masterpiece http://www.adweek.com/news/advertising-branding/inside-next-rembrandt-how-jwt-got-computer-paint-old-master-172257 https://www.nextrembrandt.com/ Strata+Hadoop World New York http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/grid/public/2016-09-28 http://hortonworks.com/blog/ http://community.cloudera.com/t5/News/ct-p/Welcome Cloudera Kudu 1.0.0 released http://community.cloudera.com/t5/Community-News-Release/ANNOUNCE-Apache-Kudu-1-0-0-released/m-p/45332 Audience Questions from Sampath @ Baltimore: http://www.infoignite.com/sentiment.html Azure HDInsight 3.5: https://azure.microsoft.com/en-gb/blog/new-security-performance-and-isv-solutions-build-on-azure-hdinsight-s-leadership-to-make-hadoop-enterprise-ready-for-the-cloud/ Azure Search: https://azure.microsoft.com/en-us/services/search/ 42:15 Security 2: Authorisation and audit The principles of auth reflected by the underlying organisation of your data Sync with AD/LDAP groups, don’t go user specific wherever possible. Use whatever tools are in your platform: Cloudera - Sentry https://sentry.apache.org/ Hortonwork - Ranger http://ranger.apache.org/ MapR - ??? https://www.mapr.com/hadoop-security-and-big-data-governance-mapr 01:10:32 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
25 Oct 2016Episode 27 – Security 3: Encryption at rest and in motion00:57:53
Rounding out our series on security in Hadoop, we finish with Encryption at rest and in motion. We go over the different approaches, do's and don'ts and mention some higher level application in this space. 00:00 News for the week! Dave: Executives Still Relying on Gut, Not Gigabytes in Planning for Future http://www.datadigestonline.com/2016/10/executives-still-relying-on-gut.html Rewriting SAS Programs for Financial Data Manipulation in R http://blog.revolutionanalytics.com/2016/09/rewriting-sas-in-r-for-finance.html Chris Surdak - Why so many Big Data projects fail http://surdak.com/innovation-vs-improvement/ Jhon: Apache Spark 2.0 Performance Improvements Investigated With Flame Graphs (14-Sep-2016) http://db-blog.web.cern.ch/blog/luca-canali/2016-09-spark-20-performance-improvements-investigated-flame-graphs SQL on Hadoop benchmarks get serious (14-Oct-2016) http://www.zdnet.com/article/sql-on-hadoop-benchmarks-get-serious/ WHERE IS APACHE HIVE GOING? TO IN-MEMORY COMPUTING. (06-Oct-2016) http://hortonworks.com/blog/apache-hive-going-memory-computing/ APACHE HIVE VS APACHE IMPALA QUERY PERFORMANCE COMPARISON (11-Oct-2016) http://hortonworks.com/blog/apache-hive-vs-apache-impala-query-performance-comparison/ Cloudera wants extra money from Intel to become a cloud provider? http://venturebeat.com/2016/08/30/cloudera-cloud-intel/ Four interesting things about IBM, Hadoop and open source (2 years old) http://www.ibmbigdatahub.com/infographic/four-interesting-things-about-ibm-hadoop-and-open-source Recovering from a database disk failure in Big SQL (20-oct-2016) https://developer.ibm.com/hadoop/2016/10/20/recovering-from-a-database-disk-failure-in-big-sql-worker-node-4-1fp2-and-4-2/ 37:20 Security 3: Encryption at rest and in motion Nice intro in the apache docs:  https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html RPC Encryption: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_Security_Guide/content/ch_wire-rpc.html 57:53 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
08 Nov 2016Episode 28 – Talking Datameer with Erik Stalpers00:59:39
In this episode, Dave is stuck in a hotel basement in the middle of internet nowhere and Erik Stalpers from Datameer joins us to talk about the Datameer exploration and visualization tool. 00:00 Recent events Dave Machine learning vs AI http://www.wired.co.uk/article/machine-learning-ai-explained Machine Learning Data Cleansing https://gcn.com/articles/2016/10/19/activeclean-big-data.aspx https://activeclean.github.io/ Battle of the Data Science Venn Diagrams http://www.kdnuggets.com/2016/10/battle-data-science-venn-diagrams.html http://www.prooffreader.com/2016/09/battle-of-data-science-venn-diagrams.html (original doc 21 september 2016) Jhon How Vector Space Mathematics Helps Machines Spot Sarcasm https://www.technologyreview.com/s/602639/how-vector-space-mathematics-helps-machines-spot-sarcasm/ Straight talk about big data http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/straight-talk-about-big-data 25:10 Talking Datameer with Erik Stalpers Erik Stalpers, Solution Engineer at Datameer https://nl.linkedin.com/in/erikstalpers   https://www.datameer.com/ 59:39End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
22 Nov 2016Episode 29 – 1 Year anniversary01:04:23
One year of elephants roaring has come and gone so we reminisce a little bit about what happened over the last year. And since we could not have done this podcast nearly as good without them, we asked the special guests we have had on the podcast over the previous year to call in on the Skype call and talk about what they have been up to. 00:00 One year of pod-casting... Dave and Jhon reminiscing about how the Podcast got started. 06:55 Fireside chats with guests over the year 07:56 Joe Witt, Senior Director of Engineering at Hortonworks, 22:40 Michele Lamarca, Team Lead Big Data at Bright Computing 43:00 John Mertic, Director of Program Management for ODPi 01:04:23 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
06 Dec 2016Episode 30 – Apache Software Foundation01:02:08
So many of the tools and projects we talk about and use every day are prefaced by 6 letters, A P A C H E... What does it mean to be an Apache project? What does the Apache Software Foundation (ASF) do for software? Are there other options? Let us tell you about the ASF! 00:00 Recent events Dave: How we caught the circle line rogue train with data https://blog.data.gov.sg/how-we-caught-the-circle-line-rogue-train-with-data-79405c86ab6a#.mhqs1mikx Black Friday 2016: Mobile vs Desktop User Behaviour http://appinstitute.com/black-friday-2016-mobile-vs-desktop-sales/ AI Machine Attempts to Understand Comic Books ... and Fails https://www.technologyreview.com/s/602973/ai-machine-attempts-to-understand-comic-books-and-fails/ https://arxiv.org/abs/1611.05118 https://arxiv.org/pdf/1611.05118v1.pdf Jhon: Paypal From Big Data to Fast Data in Four Weeks or How Reactive Programming is Changing the World Part 1 and Reactive programming manifesto http://www.reactivemanifesto.org/ https://www.paypal-engineering.com/2016/11/08/from-big-data-to-fast-data-in-four-weeks-or-how-reactive-programming-is-changing-the-world-part-1/ Part 2: How that change was followed by adding a spark micro bath (streaming) to the workflow https://www.paypal-engineering.com/2016/11/18/from-big-data-to-fast-data-in-four-weeks-or-how-reactive-programming-is-changing-the-world-part-2/ Paypal And they are not only using spark, here is one talking about how they use storm for another real-time workflow. https://www.paypal-engineering.com/2016/11/15/carrier-payments-big-data-pipeline-using-apache-storm/ Managing Spark Partitions with Coalesce and Repartition A short write up on how spark does partitioning internally and some ways of improving the partition scheme https://medium.com/@mrpowers/managing-spark-partitions-with-coalesce-and-repartition-4050c57ad5c4#.s2l3yxemt Principa The Top Predictive Analytics Pitfalls to avoid http://insights.principa.co.za/the-top-predictive-analytics-pitfalls-to-avoid?utm_content=buffera2780&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer ODPi Publishes First Operations Specification To Provide Developers Consistency Across Application Management Tools As John talked about in our anniversary episode, the ODPI 2.0 released https://www.odpi.org/announcements/2016/11/14/odpi-publishes-first-operations-specification-to-provide-developers-consistency-across-application-management-tools 25:30 Apache Software Foundation The ASF http://apache.org/ Overview http://apache.org/foundation/ Process http://apache.org/foundation/how-it-works.html The Project List http://apache.org/index.html#projects-list Other Open Source Licence Options http://choosealicense.com/ https://opensource.org/licenses 01:02:08 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
20 Dec 2016Episode 31 – Bold Predictions, Past and Future01:07:07
In this episode, we go over the bold predictions for 2016 we made just before the start of the year. Find out how right we were, or indeed how bad we are at predicting the future of Big Data. Undeterred, we then happily put on our Nostradamus hats and proceed to make even more new bold predictions for 2017. Have a listen and let us know if you agree or disagree with our view on the world? 00:03 Bold predictions - reviewing past predictions for 2016 Apace Atlas Apache Nifi Apache Spark SQL BigInsights 28:50 Bold predictions - future predictions for 2017 Fragmentation Data breaches Chat bots Self service Big Data Snake-Oil Alert Cyber security In-Memory & GPU Apache atlas BigInsights 01:07:07 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
03 Jan 2017Episode 32 – The sense and non-sense of certifications00:50:59
In this episode, we talk about the use and abuse of certifications, both the certifications you van achieve by passing an exam and the Industry ISV certifications that should help yu make purchasing decisions. 00:00 Recent events Dave 5 enterprise uses of blockchain today http://www.pcworld.com/article/3149504/cloud-computing/5-enterprise-related-things-you-can-do-with-blockchain-technology-today.html Top 7 big data trends for 2017 https://datafloq.com/read/the-top-7-big-data-trends-for-2017/2493 How to discover the hidden value in your customer journey https://www.linkedin.com/pulse/how-discover-hidden-value-your-customer-journey-ronald-van-loon Jhon Achieving a 300% speedup in ETL with Apache Spark http://blog.cloudera.com/blog/2016/12/achieving-a-300-speedup-in-etl-with-spark/ The Rhythm of Food http://rhythm-of-food.net/ http://www.thefunctionalart.com/ Information is beautiful awards http://www.informationisbeautifulawards.com/news/188-2016-the-winners Making data personal: Big data made small http://blogs.sas.com/content/sgf/2016/12/13/making-data-personal-big-data-made-small/ 27:50 The sense and non-sense of certifications Educational certifications ISV Certifications 50:59 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
17 Jan 2017Episode 33 – Roaring News00:50:24
This episode, we have an absolutely brilliant topic that we were going to cover after the news section... But the news section has us talking so much that it ran a bit long. Preferring not to give you a two hour episode, we're rescheduling the delivery of the intended topic to next episode and present you with our first (and probably last) "News only" episode. 00:00 Recent events Dave A pair of “trends to watch in 2017” http://www.techrepublic.com/article/6-big-data-trends-to-watch-in-2017/ http://www.datamation.com/applications/5-big-data-predictions-for-2017.html Learning from a Year of Security Breaches https://medium.com/starting-up-security/learning-from-a-year-of-security-breaches-ed036ea05d9b#.4r22rbfjh Failing to monetise your apps, big data can help  http://www.techrepublic.com/article/failing-to-monetize-your-apps-big-data-can-help/ A Perfect Illustration of the Big Data Value Chain http://www.techrepublic.com/article/a-perfect-illustration-of-how-the-big-data-value-chain-works/ Jhon 24/7 Spark Streaming on YARN in Production https://www.inovex.de/blog/247-spark-streaming-on-yarn-in-production/ SparkSQL, Ranger,and LLAP via Spark thrift server for BI scenarios to provide row, column level security, and masking http://hortonworks.com/blog/sparksql-ranger-llap-via-spark-thrift-server-bi-scenarios-provide-row-column-level-security-masking/ The Data Dichotomy: Rethinking the Way We Treat Data and Services https://www.confluent.io/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/ 50:24 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
31 Jan 2017Episode 34 – What do people get wrong when deploying Hadoop? – Part 101:00:45
Paul Codding and Sheetal Dolas, both from Hortonworks, join us in this first part of a two part episode where they share their experience with what can go wrong when Hadoop is deployed. Listen to the tips and tricks these gentlemen share and double the throughput for your cluster. 00:00 Recent events Dave Apache Beam becomes a top level project! https://beam.apache.org/ https://beam.apache.org/get-started/beam-overview/ https://github.com/eljefe6a/beamexample/blob/master/BeamTutorial/slides.pdf https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective Four Types of Data Analytics http://insights.principa.co.za/4-types-of-data-analytics-descriptive-diagnostic-predictive-prescriptive MapR claims open source victory with patent http://www.cbronline.com/news/verticals/cio-agenda/mapr-claims-open-source-big-data-victory-patent-award/ Jhon Ransomware attacks on insecure Hadoop systems may be next, say security researchers http://www.itworldcanada.com/article/ransomware-attacks-on-insecure-hadoop-systems-may-be-next-say-security-researchers/389944 http://www.gdi.foundation/ Revenge of the DevOps Gangster: Open Hadoop Installs Wiped Worldwide http://www.threatgeek.com/2017/01/open-hadoop-installs-wiped-worldwide.html Making Big Data User Friendly For Small Businesses https://smallbiztrends.com/2017/01/big-data-and-small-business.html 30:15 What do people get wrong when deploying Hadoop? - Part 1 An interview with two guests from Hortonworks:   Paul Codding Product Management Director at Hortonworks Sheetal Dolas Engineering Leader, Architect And Big Data Champion at Hortonworks 01:00:45 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
02 Aug 2016Episode 21 – The Open Data Platform Initiative00:59:22
This episode we have an interview with John Mertic about ODPi. There has been plenty of mystery and even some controversy about ODPi which we attempt to resolve for you. Big thanks to John for giving us some of his time for this interview! Sadly, this time the Skype Gods were not with us and we experienced some drops and hitches. We tried to smooth things over as much as possible, but we were not able to achieve our usual level of quality this time. 00:00 Recent events Vacation for Dave Study for Jhon 10:40 Interview with John Mertic @ ODPi https://www.odpi.org/ John Mertic, Director of Program Management for ODPi and Open Mainframe Project Find John on twitter: @jmertic If you're not familiar with the ODPi here's a few good links to get you started and interested in the area: Links to the ODPi Specifications: https://www.odpi.org/specifications Watch an interview with Alan Gates who discusses what the ODPi is trying to do to simplify the big data world: https://www.youtube.com/watch?v=Vogw33pbNOE Watch an interview with John Mertic who discusses how the ODPi compliance affects upstream Hadoop components: https://www.youtube.com/watch?v=siEkCutk_f8 56:30 Questions from our Listeners No questions this episode... ask us more questions and we'll answer them! 59:22End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
19 Jul 2016Episode 20 – Dave’s Hadoop Summit San Jose 2016 Retrospective – Part 201:06:28
In this second part, we discuss the sessions that Dave attended at the San Jose Hadoop Summit and we go in depth on some related topics. Since we ran over an hour with the main topic, and we did not want to make this a three-parter, we decided to forgo the questions from the audience just this one time...   00:00 Recent events Vacation tine! Edx.Org Big Data Courses 04:00 Dave's Hadoop Summit San Jose 2016 Retrospective - Part 2 Session 1: End-to-End Processing of 3.7 Million Telemetry Events per Second Using Lambda Architecture, by Saurabh Mishra @ Hortonworks and Raghavendra Nandagopal @ Symantec Talking point: Hero-culture or why nobody wants to talk about failure anymore Session 2: Top Three - Big Data Governance Issues and How Apache ATLAS resolves it for the Enterprise, by Andrew Ahn @ Hortonworks Talking point: Guaranteed Governance, who certifies the certificate? Session 3: IoT, Streaming Analytics and Machine Learning: Delivering Real-Time Intelligence With Apache NiFi, by Paul Kent @ SAS and Dan Zaratsian @ SAS Talking point: Commercial solutions versus build your own in open source Session 4: Productionizing Spark on YARN for ETL at Petabyte Scale, by Ashwin Shankar and Nezih Yigitbasi @ Netflix Talking point: Is Hadoop stilll a low-cost commodity affair? Session 5: Analyzing Telecom Fraud at Hadoop Scale, by Sanjay Vyas @ Diyotta Talking Point: Do commercial, proprietary products have a place at Hadoop Summit or are they just marketing fluff?   01:06:28 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
05 Jul 2016Episode 19 – Dave’s Hadoop Summit San Jose 2016 Retrospective00:48:24
Dave went to the Hadoop Summit 2016 in San Jose last week and came back with a riveting tale to tell. In this first part of the Summit coverage, join me when I ask Dave all about the keynotes and the general event. Join us next episode where Dave will talk about some of the sessions he attended!   00:00 Recent events Lift and shift to IaaS Hybrid Disaster Recovery Spark & ML goodness MOOC's San Jose Hadoop Summit 09:25 Dave went to the Hadoop Summit in San Jose! Record attendance, maybe a venue change in future Sponsor exhibition area including "interesting" story The Community Corner The keynotes Hadoop is 10 years old Microsoft on Machine Learning Hadoop Assemblies Hadoop fragmentation Cyber security Car insurance premiums "to measure" Ethics session 40:55 Questions from our Listeners Beefy feedback from Kris A listener wants to know if it is worth the trip to go to the US Summit or to just go to the "local" Summit, wherever that is. Nishant would like an episode about the entire ecosystem. What do you think? 48:24 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
21 Jun 2016Episode 18 – MLeap interview: Productionising Data Science – Part 200:43:18
In this episode, we have the second part of the interview with Hollin Wilkins and Mikhail Semeniuk, the driving forces behind the MLeap project where they go into more technical details and give tips on deploying MLeap in your environment. If you are working with Spark, are deep into machine learning and are struggling to put those beautifully trained models into production, you definitely do not want to miss this episode! 00:00 Recent events Yet more telco security, again. RFI for european energy company followd by "the RFI rant" Metronnnnnnnnnnn Big Data Hackathon for an airline company predicting delays Preparing an IoT hackathon on predictive maintenance Spreading the word on MLeap at a couple of customers! 11:22 Interview on MLeap with Hollin Wilkins and Mikhail Semeniuk Part 2 http://combust.ml/ http://combust.ml/blog/2016/03/30/flexible-akka-clients-and-servers-part-1.html https://github.com/TrueCar/mleap https://github.com/TrueCar/mleap-demo 35:25 Questions from our Listeners   Are there other technologies that allow machine learing models to be exposed as "web" api's?  Zeppelin multi tenant right now? 43:17End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
07 Jun 2016Episode 17 – MLeap interview: Productionising Data Science00:54:02
In this episode, we have an interview with Hollin Wilkins and Mikhail Semeniuk, the driving forces behind the MLeap project. If you are working with Spark, are deep into machine learning and are struggling to put those beautifully trained models into production, you definitely do not want to miss this episode! 00:00 Recent events Machine Learning Hackathon on Azure Strata Europe Fighting with Kafka 09:30 Interview on MLeap with Hollin Wilkins and Mikhail Semeniuk Meet Hollin and Mikhail today (7-Jun-2016) at Spark Summit 2016 in San Francisco! https://spark-summit.org/2016/events/mleap-productionize-data-science-workflows-using-spark/ http://combust.ml/ http://combust.ml/blog/2016/03/30/flexible-akka-clients-and-servers-part-1.html https://github.com/TrueCar/mleap https://github.com/TrueCar/mleap-demo 40:50 Questions from our Listeners The Episode 12 mystery unraveled Nifi works well for prototyping, but what's your view on using Nifi in production in a normal DTAP (Development, testing, acceptance and production) environment? 54:00 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
24 May 2016Episode 16 – Interview part two with Sumeet Singh – Senior Director, Cloud and Big Data Platforms @ Yahoo!00:46:35
Hopefully you enjoyed the first part of our interview with Sumeet, here is part two where we go into more detail about Yahoo's use of Hadoop, with lots of interesting topics coming up including the splintering of the ecosystem, governance and much much more.   00:00 Recent events Customer and partner adventures with Apache Nifi Jhon is settling in at Microsoft but is unfortunately quite jet-lagged. 08:15 Part two of our interview with Sumeet Singh - Senior Director, Cloud and Big Data Platforms @ Yahoo! 39:05 Questions from our Listeners Is Apache Atlas Ready for production today?   46:35 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
10 May 2016Episode 15 – Interview with Sumeet Singh – Senior Director, Cloud and Big Data Platforms @ Yahoo!01:00:56
Having met Sumeet at the Hadoop Summit we thought he'd make a great guest for the podcast, so here he is for your listening pleasure!   00:00 Recent events Louder! iTunes and the missing episode 12 Jhon's new role at Microsoft Hadoop as a Service A fortnight of SAS + Hadoop Metron teething troubles https://issues.apache.org/jira/browse/METRON-136 17:50 Interview with Sumeet Singh - Senior Director, Cloud and Big Data Platforms @ Yahoo!   42:50 Questions from our Listeners One data-lake for all workloads? Or separate clusters for each set of workloads? How large a team do I need to manage a Hadoop cluster?   1:00:56 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
26 Apr 2016Episode 14 – Hadoop Summit – Retrospective00:51:47
After the last two special edition episodes where we quickly covered each Summit day in a "same-day" episode, we go over the full event in this episode, highlighting the sessions we enjoyed the most and sharing our general feelings about the 2016 Hadoop Summit in Dublin.   00:00 Recent events Summit! Sessions on youtube Meetings and planning, Apache Metron https://cwiki.apache.org/confluence/display/METRON/Metron+Wiki https://community.hortonworks.com/articles/26047/apche-metron-tp1-blog-series.html Setting up a new podcast recording "studio" 09:00 Hadoop Summit - Retrospective Summit Schedule App Hortonworks emphasising  Streaming ingest using Nifi, but the other talks did not so much Summit video sessions are starting to appear online https://www.youtube.com/channel/UCAPa-K_rhylDZAUHVxqqsRA/videos Next year: Munich Day one sessions: It's not the size of your cluster, It's how you use it Big Fish - David Darden & Don Smith Unified stream and batch processing with Apache Flink Artisans Gmbh - Ufuk Celebi Taming the Elephant  Hortonworks - Paul Codding How To: A beginners guide to becoming an apache contributor Teradata - Venkatesh  On-Demand HDP Clusters using Cloudbreak and Ambari Symantec - Karthik Karuppaiya & Narendra Bidari Machine Learning in Big Data - Look Forward or be left behind Redpoint Global Inc - Bill Porto Past, Present, Future of hadoop at LinkedIn LinkedIn - Carl Steinbach Migrating Hundreds of Pipelines in Docker Containers Spotify - Noa Resare Day two sessions: MLLeap: Or how to Productionize Data science workflows using Spark Shift Technologies - Mihkail Semenluk & TrueCar - Hollin Wilkins Scaling out to 10 Clusters, 1000 Users, and 10,000 Flows: The Dali Experience at LinkedIn Carl Steinbach, LinkedIn Hadoop Platform at Yahoo: A Year in Review Sumeet Singh, Yahoo!, Inc. Apache Hive 2.0 SQL Speed Scale Hortonworks - Allen Gates Telematics with Hadoop and Nifi Adam Morton, Admiral Insurance - Simon Elliston Ball, Hortonworks Apache Eagle - Monitor Hadoop in Real-Time Ebay - Young Zang & Arun manoharan 43:18 Questions from our Listeners Great question in from Rene about small businesses and Big Data which we’ll cover on a future episode! Also Rene's feedback has helped us tweak the feedback form so it’s easier to use. Is this a vendor podcast? No, we’re all community! :o) How do you record the podcast, what is your equipment? Skype-saurus: the original, expensive hardware solution. http://www.leoville.com/the-skypesaurus-story (Sadly, this no longer seems to be available anymore.) Skype-o-saurus: a cheaper solution using an OS-X agregate sound device. https://drupalize.me/blog/201504/recording-podcasts-creating-skype-o-saurus   51:48 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
14 Apr 2016Episode 13 – Hadoop Summit Dublin 2016 – Day 200:37:47
Welcome to our second special edition podcast bought to you from day 2 of the Hadoop Summit. Breaking our normal fortnightly flow we're delivering a fresh new podcast at the end of each day of the Hadoop Summit. In this episode we cover our impressions of the second day of keynotes and yet more sessions that we enjoyed. 00:00 Recent events Introduction to the Hadoop Summit Dublin 2016 from day 2 01:45 Hadoop Summit 2016 Dublin Day 2 Review Keynote/Session - Yahoo! - Sumeet Singh Keynote - Information is Beautiful - David McCandless http://www.informationisbeautiful.net/ MLeap - Mihael Semeniuk (shift Technologies) Hollin Wilkins (Truecar) Admiral - Adam Morton (Admiral) and Simon Ball (Hortonworks) Hive - Alan Gates (Hortonworks) 37:47 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
13 Apr 2016Episode 12 – Hadoop Summit Dublin 2016 – Day 100:29:38
Welcome to our special edition podcast bought to you from day 1 of the Hadoop Summit. Breaking our normal fortnightly flow we're delivering a fresh new podcast at the end of each day of the Hadoop Summit. In this episode we cover our impressions of the keynotes and some of the sessions we enjoyed during day 1. 00:00 Recent events Introduction to the Hadoop Summit episode for day 1 01:40 Main Topic Some comments from attendees as to what they're looking forward to at the event Conversation about the keynotes and the sessions we enjoyed 29:38 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
05 Apr 2016Episode 11 – Interview with Community Award Winner Venkatesh Sellappa00:37:18
Venkatesh is a new contributor to Apache NiFI and during his talk at the Hadoop Summit next week, he takes a light-hearted look at his journey of how to become a contributor to an Apache Project. Venkatesh is one of the Community Choice winners, so congratulation are in order and we are certain you will like this interview! Enjoy, and we looking forward to seeing you at the Hadoop Summit in Dublin next week! 00:00 Recent events Easter Break Big Data Analytics Big Telco workshops/meetings and sessions stuff Domain Knowledge is important 05:40 Main Topic Interview with Venkatesh Sellappa 33:50 Questions from our Listeners: No questions this time but information on our activities during the upcoming Hadoop Summit. 37:18 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
22 Mar 2016Episode 10 – Preparing for the 2016 Hadoop Summit in Dublin01:03:50
Next month, the European Hadoop Summit will take place in Dublin. Now that the agenda for the event has been nearly finalised we take it upon ourselves to provide a virtual guide to the event. There's a lot of good things happening during the event so we share with you what sessions we think we'll be attending and why. Enjoy, and looking forward to seeing you there! This is another long episode, going over an hour for the first time. We are really curious to know if you like these longer episodes, or if you would prefer it if we kept it under the original 30 to 35 minutes? 00:00 Recent events Hands on upgrading, express vs rolling upgrade Workshop at telecom company in Russia Nifi workshops Securing a Hadoop cluster 08:00 Main Topic Dave has assembled some statistics on the type of sessions available. What sessions we would attend and why. http://hadoopsummit.org/dublin/agenda/ General advice to visitors mixed in...   54:30 Questions from our Listeners: What else is going on during the summit dates? Should I visit the Hadoop Summit and if so, go to Europe, the US or Australia? How do I get a speaking slot at summit? https://hadoopsummit.uservoice.com/ What other events are comparable/usefull to visit? 01:03:50 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
08 Mar 2016Episode 9 – SQL in Hadoop00:53:38
SQL was one of the first data access methods added to vanilla Hadoop. Considering that the many of the people working with Hadoop in the early days came from a database background, this is not surprising. Since then, the SQL ecosystem in Hadoop has grown considerably and in this episode we do a general overview of many of the available choices.This episode runs a bit longer than normal but we hope you'll find it worthwhile! 00:00 Recent events Spark masterclasses NiFi on trains Mifid II and the active archive World Mobile Congress 08:30 Main Topic SQL solutions: Apache Hive https://hive.apache.org/ Apache Spark Sql http://spark.apache.org/sql/ Apache Phoenix https://phoenix.apache.org/ Apache Impala (incubating) https://www.cloudera.com/products/apache-hadoop/impala.html Apache Hawq (incubating) http://hawq.incubator.apache.org/ Apache Drill https://drill.apache.org/ Presto https://prestodb.io/ Oracle Big Data Sql http://www.oracle.com/us/products/database/big-data-sql/overview/index.html IBM BigSql http://www-01.ibm.com/software/data/infosphere/hadoop/big-sql.html Technology topics: JDBC/ODBC SQL syntax compliance Multi-user concurrency Benchmarks 46:40 Questions from our Listeners: How much storage overhead should I count on if I add SQL in my Hadoop workflow? How do I make my sql faster? 53:38 End     Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
23 Feb 2016Episode 8 – NiFi Deeper Dive00:47:18
In this episode we'll go into more depth on NiFi complete with our second interview with Joe Witt, Senior Director of Engineering at Hortonworks who dives into how NiFi works under the covers and some considerations to think about when using it for real. 00:00 Recent events New logo for the podcast Hadoop use in telecom Spark masterclass details Apache Nifi "Hype Train" concerns 09:14 Main Topic Second interview with Joe Witt: a deeper dive on Apache NiFi 35:30 Questions from our Listeners: I have already implemented some of my ingest in flume/kafka/storm, do I need to replace that with NiFi? Is it true there is no chance of data loss with NiFi? Can I aggregate or combine data as part of the flow process? Do I need a hadoop cluster to use NiFi? 47:18 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
09 Feb 2016Episode 7 – An introduction to Data Ingest00:37:15
In this episode we'll cover some of the most common options for ingesting data into Hadoop including technologies like Flume, Sqoop, Kafka, NiFi and more. 00:00 Recent events Upcoming masterclasses on NiFi and Spark NiFi deployment on trains Podcast publicizing Global Systems Integrator training day 06:40 Main Topic Apache Sqoop Apache Flume Apache Kafka Apache NiFi Other Low level ingest methods 28:00 Questions from our Listeners:  I want to transform the data to it’s final form before it lands in the Hadoop cluster. Which ingest tool should I use? What about XYZ vendors “hadoop loader/ingest” tool ? Do all these tools run on my hadoop nodes? How does lambda architecture fit with data ingest? 37:15 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
26 Jan 2016Episode 6 – An introduction to NiFi00:30:45
In this episode we'll cover some an introduction to NiFi complete with an interview with Joe Witt, Senior Director of Engineering at Hortonworks who explains exactly where NiFi came from and how it fits into your Big Data plans. 00:00 Recent events The usual "Start of the Year" meetings and events Using Apache NiFi as a self documenting deployment system We are now available on iTunes 04:50 Main Topic Interview with Joe Witt, one of the creators of Apache NiFi and currently Director of Engineering for HDF at Hortonworks. 22:40 Questions from our Listeners: Is NiFi really as easy to use as it looks? Is NiFi a part of Hadoop now? >How do I get started with NiFi? Is NiFi an ETL tool? 30:45 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
12 Jan 2016Episode 5 – An introduction to Spark00:37:50
In this episode we'll cover the basics of Apache Spark, including typical deployment situations, architecture and usage.   00:00 Recent events Seasons Greetings! Jhon shamelessly plugs his mini cluster build Apache Mesos Amazon IoT solution 05:28 Main Topic Who would use Apache Spark, why would you use it, where would you use it Apache Spark Architecture Apache Spark Components Apache Spark MLlib Apache Spark gotcha's Typical use cases for Apache Spark 28:20 Questions from our Listeners:   What happens if all my data does not fit in memory? What is the security like for Spark? Why Spark on Hadoop instead of standalone Python, Scala, Java or something else for Spark? Can I access data on HDFS or local disk from my Spark script? 37:50 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.  
16 Aug 2016Episode 22 – Big Data in Small Business01:32:35
The main subject in this episode features answer to a listener question we received a couple of months ago: How can big data help small businesses? What ways can small business use big data? At the moment all the talk is about big data helping enterprise firms. And we are introducing a new section which we hope you will enjoy! 00:00 Recent events Working with a new team in sunny cork, getting them up to speed Workshop with a global SI and a European tel-co about the upcoming phases of their big data journey Workshop with a customer who has been using Hadoop for a very long time, since Hadoop 0.2! Finally looking to migrate into the future Multi vendor workshop fraud analytics Object recognition and detection in images. 11:30 Our very own "New and Noteworthy" Dave http://blogs.teradata.com/international/streaming-analytics-story-many-tales/ http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A453888 http://research.ibm.com/cognitive-computing/ostp/rfi-response.shtml http://dataconomy.com/10-online-big-data-courses-2016/   Jhon Apache Spark 2.0 (July 28, 2016) http://spark.apache.org/releases/spark-release-2-0-0.html Unifying DataFrame and Dataset (RDD): In Scala and Java, DataFrame and Dataset have been unified, i.e. DataFrame is just a type alias for Dataset of Row. SparkSession: new entry point that replaces the old SQLContext and HiveContext for DataFrame and Dataset APIs. MLLib: The DataFrame-based API is now the primary API. The RDD-based API is entering maintenance mode. Spark 2.0 substantially improved SQL functionalities with SQL2003 support. Spark SQL can now run all 99 TPC-DS queries Ships the initial experimental release for Structured Streaming, a high level streaming API built on top of Spark SQL Databricks article: https://databricks.com/blog/2016/07/28/continuous-applications-evolving-streaming-in-apache-spark-2-0.html Apache Mesos 1.0 released https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces97 http://techblog.netflix.com/2016/07/distributed-resource-scheduling-with.html Apache Twill becomes top level project http://twill.apache.org/ https://blogs.apache.org/foundation/entry/apache_software_foundation_announces_apache1 44:40 Big Data for Small Business Define "small business" How can big data help small businesses What ways can small business use big data The problems a small business could face http://www.columnfivemedia.com/100-best-free-data-sources-infographic Our answers to those problems Some conclusions 01:32:35 End   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
06 Mar 2018Episode 77 – Roaring News00:48:10
Another Roaring News wpisode where we cover recent Big Data News items we found interesting. This time we talk about Open Source turning 20 years old, the annoyances that come with Smart Homes and a big data device in Germany. Additionally, we talk about some introductory guides to AI. Breaking News 20 years of open source + who contributes http://www.zdnet.com/article/open-source-turns-20/ https://www.infoworld.com/article/3253948/open-source-tools/who-really-contributes-to-open-source.html Smart home living is annoying as hell https://gizmodo.com/the-house-that-spied-on-me-1822429852 Big Data Divide https://www.politico.eu/article/to-protect-or-collect-germanys-big-data-divide/ The Art of Learning Data Science https://medium.com/@aparnack/the-art-of-learning-data-science-65b9f703f932 The Long Road To Become a Big Data Scientist - Infographic https://medium.com/@aparnack/sequel-to-the-art-of-learning-data-science-cb2e1f078e5a An executive’s guide to AI https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/an-executives-guide-to-ai?cid=other-soc-twi-mip-mck-oth-1802&kui=udT5IIoYx3yxUmZYJz7_2A Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
20 Mar 2018Episode 79 – Roaring News00:37:19
Another Big Data news episode! This time we consider the Big or small nodes conundrum based on an article that after close scrutiny doesn't really seem to test the real issue. Other things that get covered are Linkedin's Dynanometer, Cloudera's full production architecture advise for a recommendation service and a really interesting visualization technique based on blobs. Breaking News Big Data, Small Nodes https://insidebigdata.com/2018/02/22/make-sense-big-data-small-nodes/ Dynamometer Release https://github.com/linkedin/dynamometer https://venturebeat.com/2018/02/08/linkedin-open-sources-dynamometer-for-hadoop-performance-testing-at-scale/ Cisco IoT predictions Aka someone somewhere trots out the old “data is the new oil” trope for one more circuit, please please please stop? https://www.networkworld.com/article/3257769/internet-of-things/7-transportation-iot-predictions-from-cisco.html Production Recommendation Systems with Cloudera http://blog.cloudera.com/blog/2018/02/production-recommendation-systems-with-cloudera/ A Day in the Life of Americans http://flowingdata.com/2015/12/15/a-day-in-the-life-of-americans/ Intercontinental Ballistic Microfinance (2006) https://vimeo.com/28413747 Understanding AI, Machine Learning & Predictive Analytics https://www.forcecast.com/blog/understanding-ai-machine-learning-predictive-analytics/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
27 Mar 2018Episode 80 – Big Data Tracking00:51:25
Last June, Wolfie Christl published a 93 page report Corporate Surveillance in Everyday Life using big data tracking. Apart from the massive pdf that can be downloaded on the net, an extensive summary can be found on the Cracked Labs website. In this episode we go over the content and give our views on the subject. If you want to follow along with us while we are discussing the different point in the onlin earticle, here is the link:  http://crackedlabs.org/en/corporate-surveillance Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
03 Apr 2018Episode 81 – Roaring News00:26:19
In this installment of Big Data News, we talk about the recent Facebook leak, how everybody is still doing it wrong (according to some at least) and installing Hadoop "the old-fashioned way". Also briefly covered is Elastic's X-Pack, now even more "open" than before, but still rather closed it would seem. Breaking News Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
10 Apr 2018Episode 82 – DataWorks Summit Berlin 2018 Preview00:47:38
Next week is DataWorks Summit Berlin week! Your two hosts will be in attendance and in this episode we go over the agenda and plan which sessions we want to attend and why. Peppered throughout we add further insights and experiences from previous years. Unfortunately, Dave's network was a little unstable and there are a couple audio glitches in this episode. For some session statistics or if you can use some help deciding what sessions you want to attend, you can use the dashboard we created: Click the screenshot above or go to http://aka.ms/DWS2018 to access the dashboard. It is a dynamic report: clicking on graph elements (bars of pie slices) will apply filters on all the visualizations and the session list. Use control-click to combine filters. At some point the dashboard will dissapear because it is no longer relevant. for future reference, here is a large version of the screenshot. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
19 Jun 2018Episode 93 – Apache Kylin: Extreme OLAP Engine for Big Data00:46:14
In this episode Apache PMC member Dong Li joins us to explains how Apache Kylin can deploy Analytical OLAP cubes in your Big Data environment. http://kylin.apache.org/       Dong Li Technical Partner & Senior Architect of Kyligence (linkedin) PMC Member of Apache Kylin http://en.kyligence.io/     Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
01 May 2018Episode 86 – Druid: a high-performance, column-oriented, distributed data store – part 100:31:57
This is the first part of an interview with Fangjin Yang, co-founder and CEO at Imply and committer/PMC member for the Druid project. Druid: a high-performance, column-oriented, distributed data store which has entered the Hadoop environment with the recent integration with Apache and we since Druid has been around for a while, we are grateful to FJ for spending some time with our listeners. Fangjin Yang Cofounder and CEO at Imply (linkedin)       Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
08 May 2018Episode 87 – Druid: a high-performance, column-oriented, distributed data store – part 200:31:53
This is the second part of an interview with Fangjin Yang, co-founder and CEO at Imply and committer/PMC member for the Druid project. Druid: a high-performance, column-oriented, distributed data store which has entered the Hadoop environment with the recent integration with Apache and we since Druid has been around for a while, we are grateful to FJ for spending some time with our listeners. Fangjin Yang Cofounder and CEO at Imply (linkedin)       Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
18 Apr 2018Episode 83 – DataWorks Summit Berlin – Day 1 Recap01:23:45
Another year, another European Dataworks Summit, and yes, another daily recap show from Jhon and Dave. We walk through the keynotes and sessions we attended and give our thoughts and views. This should be useful for anyone who wasn't able to attend or those seeking to peek into sessions they couldn't make. No real editing on this one, recording in a hotel room so audio quality may not be up to our usual standards, we hope you'll forgive us! Enjoy! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
19 Apr 2018Episode 84 – DataWorks Summit Berlin – Day 2 Recap01:30:26
And with the end of day two of the 2018 DataWorks Summit in Berlin comes the end of this years Europe Summit. But never fear, we have an extra 90 minutes of DataWorks goodness for you to consume on your way home. No real editing on this one, recording in a hotel room so audio quality may not be up to our usual standards, we hope you'll forgive us! Enjoy! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
24 Apr 2018Episode 85 – DataWorks Summit Community Showcase Exhibitor Soundbites00:30:34
This is the final part of our coverage of the DataWorks Summit Berlin 2018. Normally we would not have had an episode this week, since we were in Berlin last week, but we had lightning interviews with the vendors in the Community Expo Are and used that coverage to make this episode. So less of "Dave & Jhon" and more "ecosystem tech" snippets this time. Even though this does stray a bit from our usual content, we still hope it is useful. This was recorded in a hotel room and on the expo floor so the audio quality is not up to our usual standards, we hope you’ll forgive us! Here is a timestamped list of the lightning interviews: 02:41 Hortonworks https://hortonworks.com/ 06:28 Alation https://alation.com/ 08:45 Arcadia Data https://www.arcadiadata.com/ 11:12 Attunity https://www.attunity.com/ 13:10  BlueMetrix https://www.bluemetrix.com/ 15:27 BMW https://www.bmw.com 18:04 IBM https://www.ibm.com 19:54 Microsoft https://www.microsoft.com 22:15 Nutanix https://www.nutanix.com/ 23:26 Syncsort https://www.syncsort.com 24:54 Synerscope http://www.synerscope.com/ 27:05 Talend https://www.talend.com 27:59 Teradata https://www.teradata.com/ 29:02 -Interview End- Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
15 May 2018Episode 88 – Roaring News00:35:07
Returning to our more regular schedule, we have a Roaring News episode today. Dave has articles on multi-cloud readiness, Big Data being a pariah, and Google Duplex and Jhon came up with Synthetic data, data engineers and scientists and a Neural Network sharing cake recipes. Breaking News Dave Less than 10% ready for multi cloud http://www.cloudpro.co.uk/cloud-essentials/hybrid-cloud/7451/idc-less-than-10-of-organisations-are-ready-for-multi-cloud Tech companies distancing themselves from Big Data https://qz.com/1262102/tech-companies-are-distancing-themselves-from-big-data/ Google Duplex https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html Jhon The Rise of Synthetic Data to Help Developers Create and Train AI Algorithms Quickly and Affordably https://insidebigdata.com/2018/05/08/rise-synthetic-data-help-developers-create-train-ai-algorithms-quickly-affordably/ Data engineers vs. data scientists https://www.oreilly.com/ideas/data-engineers-vs-data-scientists?utm_medium=social&utm_source=twitter.com&utm_campaign=awareness&utm_content=radar+content+datascience We asked a neural network to bake us a cake. The results were...interesting. https://www.popsci.com/neural-network-bakes-a-cake Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
22 May 2018Episode 89 – DataWorks Summit San Jose Agenda Review01:12:20
With the San Jose edition of the DataWorks Summit only a month away, we go over the sessions that are available in the agenda today and offer our top picks. If you're going, or if you will be watching the replays online, we hope to guide you on your selection of sessions. DataWorks Summit San Jose 2018 And here is the dashboard we created with statistics on the San Jose sessions, for your enjoyment: https://aka.ms/DWS2018SJ The agenda is still in flux so we will be updating the dashboard regularly. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
29 May 2018Episode 90 – Roaring news00:38:09
In this weeks Roaring News episode, Dave brings up the resilience of Apache Community open source projects and plays some Doom. Jhon has some practical Apache NIFI guides and the emergence of multi modal NoSQL databases. Breaking News DataWorks Summit Berlin video recordings are up: https://www.youtube.com/user/HadoopSummit/playlists Find Dave on his Australian road-trip: http://bit.ly/aus-nz-ibm-hwx-tour Dave DataTorrent, Stream Processing Startup, Folds (Apache Apex) https://www.datanami.com/2018/05/08/datatorrent-stream-processing-startup-folds/ DOOM! https://arxiv.org/abs/1804.09154 https://www.technologyreview.com/s/611072/ai-generates-new-doom-levels-for-humans-to-play/ https://www.youtube.com/watch?v=K32FZ-tjQP4 Bonus doom news: https://www.rockpapershotgun.com/2018/03/28/dodge-fireballs-forever-in-a-neural-nets-doom-nightmare/ https://worldmodels.github.io/ Jhon Accessing Feeds from EtherDelta on Trades, Funds, Buys and Sells (Apache NiFi) https://community.hortonworks.com/articles/191146/accessing-feeds-from-etherdelta-on-trades-funds-bu.html?es_p=6741162 NiFi Processing and Flow with Couchbase Server https://blog.couchbase.com/nifi-processing-flow-couchbase-server/ The new era of the Multi-Model Database https://www.zdnet.com/article/the-new-era-of-the-multi-model-database/ Seven Databases in Seven Weeks, Second Edition - A Guide to Modern Databases and the NoSQL Movement https://pragprog.com/book/pwrdata/seven-databases-in-seven-weeks-second-edition Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
05 Jun 2018Episode 91 – ODPi is back and better than ever!01:08:00
In this episode, we welcome back John Mertic, director of Program Management for ODPi, R Consortium, and the Open Mainframe Project. It's been almost two years since we checked in with John and the ODPi initiative and as John mentions in the interview, a lot has changed in Hadoop... ODPi logo John Mertic Director of Program Management for ODPi, R Consortium, and Open Mainframe Project https://www.linkedin.com/in/jmertic/ ODPi website links: https://www.odpi.org/ https://www.odpi.org/blog/2018/04/04/the-state-of-open-source-and-big-data-three-years-later https://www.odpi.org/projects/data-governance-pmc https://www.odpi.org/events Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
12 Jun 2018Episode 92 – Roaring news00:46:08
Another week, another edition of Roaring Big Data News. This time, Dave talks about driving teens and Jhon takes a detailed look at an Eventbrite data pipeline article. Breaking News Dave Driver monitoring isn't just for teens; adults can benefit, too https://arstechnica.com/cars/2018/05/buicks-smart-driver-explains-why-my-gas-mileage-sucks-and-my-editors-doesnt/  Jhon Looking under the hood of the Eventbrite data pipeline! https://www.eventbrite.com/engineering/looking-under-the-hood-of-the-eventbrite-data-pipeline/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
26 Jun 2018Episode 94 – Roaring news00:37:39
I this weeks edition of Roaring Big Data News, Dave talks about modernizing Hadoop and a billion java errors. Jhon has an article on improving your learning data sets. We finish with a discussion about the newly released HDP 2.6.5 with an emphasis on the deprecation notices and Yarn Containers. Breaking News Dave Modernizing Hadoop: Reaching the plateau of productivity https://www.zdnet.com/article/modernizing-hadoop-reaching-the-plateau-of-productivity/ 1 billion Java errors, here’s what causes 97% of them https://blog.takipi.com/we-crunched-1-billion-java-logged-errors-heres-what-causes-97-of-them/ https://blog.takipi.com/the-top-10-exceptions-types-in-production-java-applications-based-on-1b-events/ Jhon Why you need to improve your training data, and how to do it https://petewarden.com/2018/05/28/why-you-need-to-improve-your-training-data-and-how-to-do-it/amp/ Announcing the General Availability of Hortonworks Data Platform (HDP) 2.6.5, Apache Ambari 2.6.2 and SmartSense 1.4.5 https://hortonworks.com/blog/announcing-general-availability-hortonworks-data-platform-hdp-2-6-5-apache-ambari-2-6-2-smartsense-1-4-5/ Component Versions https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_release-notes/content/comp_versions.html Deprecation Notices https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_release-notes/content/deprecated_items.html YARN Containers Trying out Containerized Applications on Apache Hadoop YARN 3.1 https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/ Containerized Apache Spark on YARN in Apache Hadoop 3.1 https://hortonworks.com/blog/containerized-apache-spark-yarn-apache-hadoop-3-1/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
17 Jul 2018Episode 97 – ODPi: A new world for data governance01:07:57
In this episode, we welcome back John Mertic one more time. It was quite obvious that John had lots more to talk about at the end of our last interview with him. ODPi has recently reinvented itself, moving away from a strict distribution standards body towards data governance and reference specifications. ODPi logo John Mertic Director of Program Management for ODPi, R Consortium, and Open Mainframe Project https://www.linkedin.com/in/jmertic/ ODPi website links: https://www.odpi.org/ https://www.odpi.org/blog/2018/04/04/the-state-of-open-source-and-big-data-three-years-later https://www.odpi.org/projects/data-governance-pmc https://www.odpi.org/events Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
10 Jul 2018Episode 96 – Roaring news00:46:05
In this edition of Roaring news, Ward Bekker returns to discuss what is happening in the world of Big Data. Ward brings news on GPUs in supercomputers and how Big Data could be wrong about you. Dave and Jhon found articles on Big data growth visualizations and GDPR. Breaking News 10 Charts that will change your perspective of Big Data’s Growth https://www.forbes.com/sites/louiscolumbus/2018/05/23/10-charts-that-will-change-your-perspective-of-big-datas-growth/#1ea595702926 New GPU-Accelerated Supercomputers Change the Balance of Power on the TOP500 https://www.top500.org/news/new-gpu-accelerated-supercomputers-change-the-balance-of-power-on-the-top500/ GDPR: A Call to Remove Technical Debt from Data Science https://medium.com/@kjarmul/gdpr-a-call-to-remove-technical-debt-from-data-science-c103a01c3102 Everything big data claims to know about you could be wrong http://news.berkeley.edu/2018/06/18/big-data-flaws/ Our thanks to Ward for adding some variety to this News episode.   Ward Bekker (Linkedin) Pre-Sales Solutions Engineer II @ Hortonworks   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
31 Jul 2018Episode 99 – The State of Big Data at Codemotion Amsterdam00:45:28
The Roaring Elephant podcast was a guest at the Codemotion conference in Amsterdam a little while ago. This episode contains the audio of the talk we did on the State of Big Data. Our talk was dfinitely light on slideware, but if you want to see the video cast of our presentation, you can find it on the Codemotion youtube channel:Codemotion Amsterdam 2018: The State of Big Data by Roaring Elephant podcast Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
24 Jul 2018Episode 98 – Roaring news00:22:16
In this episode of Big Data Roaring News, Dave laments another announcement of Hadoop's demise and exposes A.I. imposters. Jhon has articles comparing Ranger with Sentry and Apache Nifi reaching the ripe age of 1.7 with a Minifi charged practical demo to prove the point. Breaking News Hadoop’s star dims in the era of cloud object data storage and stream computing https://siliconangle.com/blog/2018/07/09/hadoops-star-dims-era-cloud-object-data-storage-stream-computing/ The rise of “pseudo-ai” how tech firms quietly use humans to do bots work https://www.theguardian.com/technology/2018/jul/06/artificial-intelligence-ai-humans-bots-tech-companies Apache Ranger Vs Sentry https://www.linkedin.com/pulse/apache-ranger-vs-sentry-mythily-rajavelu/ How to build an IIoT system using Apache NiFi, MiNiFi, C2 Server, MQTT and Raspberry Pi https://medium.freecodecamp.org/building-an-iiot-system-using-apache-nifi-mqtt-and-raspberry-pi-ce1d6ed565bc Apache Nifi Version 1.7.0 released: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
07 Aug 2018Episode 100 – Celebrating our Centennial with the history of Hadoop01:07:19
100 Big Data episodes! We made it, in no small part thanks to our audience: you are who keeps us going! In this episode we celebrate our centennial by going over the history of Hadoop releases, highlighting the most noteworthy events along the way. Join us down the twisty paths of our  memory lanes! The blockchain related  Linkedin post Jhon liked The sources for this episode: http://hadoop.apache.org/releases.html https://en.wikipedia.org/wiki/Apache_Hadoop Debate over which company had contributed more to Hadoop: http://hortonworks.com/blog/reality-check-contributions-to-apache-hadoop/ Thank you for being part of the ride and now on to episode 200! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
03 Jul 2018Episode 95 – DataWorks Summit in San Jose with Ward Bekker01:52:50
Since both Dave and Jhon were not able to attend the Dataworks Summit in San Jose a couple of weeks ago, we have a guest, Ward Bekker, who was happy to join and educate us on the subject. DataWorks Summit San Jose 2018 In this episode we discuss the daily keynotes and Wards' selection of sessions at the Summit ranging from the new things in Yarn 3.0, Materialized views in Hive and much more.   Ward Bekker (Linkedin) Pre-Sales Solutions Engineer II @ Hortonworks   Some of the sessions and topics discussed are: Apache Hadoop State of the union https://dataworkssummit.com/san-jose-2018/session/apache-hadoop-yarn-state-of-the-union-2/ What is new in Apache Hive https://dataworkssummit.com/san-jose-2018/session/what-is-new-in-apache-hive/ Runing distributed tensorflow in production https://dataworkssummit.com/san-jose-2018/session/running-distributed-tensorflow-in-production-challenges-and-solutions-on-yarn-3-0-2/ Just the sketch: advanced streaming analytics in Apache Metron https://dataworkssummit.com/san-jose-2018/session/just-the-sketch-advanced-streaming-analytics-in-apache-metron/ Containers and Big Data https://dataworkssummit.com/san-jose-2018/session/containers-and-big-data/ Catch a hacker in realtime: Live visuals of bots and bad guys https://dataworkssummit.com/san-jose-2018/session/catch-a-hacker-in-realtime-live-visuals-of-bots-and-bad-guys/ HDFS tiered storage https://dataworkssummit.com/san-jose-2018/session/hdfs-tiered-storage/ Geospatial data platform at Uber https://dataworkssummit.com/san-jose-2018/session/geospatial-data-platform-at-uber/ What's the Hadoop-la about Kubernetes? https://dataworkssummit.com/san-jose-2018/session/whats-the-hadoop-la-about-kubernetes/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
14 Aug 2018Episode 101 – Apache Pulsar update with Matteo and Sijie from Streamlio01:05:48
Matteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts and here is the first part where they introduce Apache Pulsar, go in depth on the correct deployment scaling of a stable Pulsar cluster and clarify Pulsars "at least once vs exactly once" strategy. Part two will go in more depth on what's new. Stay tuned! Apache Pulsar logo   Matteo Merli (https://www.linkedin.com/in/matteomerli/) Co-Founder - Software Engineer Sijie Guo (https://www.linkedin.com/in/samuelguo/) Co-Founder Apache Pulsar (incubating) https://pulsar.apache.org/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
28 Aug 2018Episode 103 – Apache Pulsar version 2.0 with Matteo and Sijie from Streamlio00:43:31
Matteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts. the first of which was published in episode 101. Here is the second part with information on version 2.0 and the future of the Apache Pulsar project. Apache Pulsar logo   The first subject taken on by Sijie is Pulsar Functions, followed by Matteo talking about the new schema registry and Topic Compaction. With a new major version being released, users will probably want to upgrade so we asked the guys about the upgrade path. The rest of the episode, Matteo and Sijie share what they can regarding the future Pulsar Roadmap. Matteo Merli (https://www.linkedin.com/in/matteomerli/) Co-Founder - Software Engineer Sijie Guo (https://www.linkedin.com/in/samuelguo/) Co-Founder Apache Pulsar (incubating) https://pulsar.apache.org/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
21 Aug 2018Episode 102 – Roaring News00:22:07
Big Data News at the end of the summer is not easy to find, but we did end up with three topics to discuss: from isolating GPUs in Hadoop 3.x to replicating big data (to the cloud) and quick tips from Adam's blog. Breaking News First Class GPUs support in Apache Hadoop 3.1, YARN & HDP 3.0 https://hortonworks.com/blog/gpus-support-in-apache-hadoop-3-1-yarn-hdp-3/ Replicating big datasets in the cloud https://medium.com/hotels-com-technology/replicating-big-datasets-in-the-cloud-c0db388f6ba2 https://dataworkssummit.com/berlin-2018/session/tools-and-approaches-for-migrating-big-datasets-to-the-cloud/ https://www.slideshare.net/Hadoop_Summit/tools-and-approaches-for-migrating-big-datasets-to-the-cloud Quick Tip: The easiest way to grab data out of a web page in Python https://medium.com/@ageitgey/quick-tip-the-easiest-way-to-grab-data-out-of-a-web-page-in-python-7153cecfca58 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
04 Sep 2018Episode 104 – Roaring News00:36:55
In this Big Data News episode, we discuss an article with guidelines on how you should arrange your data gathering projects with the customer in mind. Dave brings a matrix of visualization products. Breaking News The five Cs: Five framing guidelines to help you think about building data products. https://www.oreilly.com/ideas/the-five-cs?utm_medium=social&utm_source=twitter.com&utm_campaign=awareness&utm_content=radar+content The Chartmaker Directory http://chartmaker.visualisingdata.com/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
18 Sep 2018Episode 106 – Roaring News00:39:15
In this edition of Big Data News, we take the pulse of Machine learning adoption and talk about Big Data  Online Learning by IBM on Coursera and by Columbia University on Edx. We round the episode off with a look at MR3 and the evil that are benchmarks. Breaking News Data Science Professional Certificate https://cognitiveclass.ai/blog/data-science-professional-certificate/ Taking the pulse of machine learning adoption https://www.zdnet.com/article/taking-the-pulse-of-machine-learning-adoption/ Performance Comparison of HDP LLAP, Presto, SparkSQL, Hive on Tez, and Hive on MR3 using the TPC-DS Benchmark https://mr3.postech.ac.kr/blog/2018/08/15/comparison-llap-presto-spark-mr3/ Join Jhon on Artificial Intelligence (AI) & Robotics by ColumbiaX on Edx https://www.edx.org/micromasters/columbiax-artificial-intelligence https://www.edx.org/course/robotics-columbiax-csmm-103x-4 https://www.edx.org/course/artificial-intelligence-ai-columbiax-csmm-101x-4 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
11 Sep 2018Episode 105 – Big Data at British Telecom with Phillip Radley01:06:32
In this episode we welcome Phil Radley, Chief Data Architect at BT to talk about the Big Data deployment at BT.   Phillip Radley (Linkedin) Chief Data Architect @ BT https://home.bt.com/     Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
25 Sep 2018Episode 107 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 100:41:50
In this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us. In this first part, the focus is more on Mandy herself and we lay the groundwork for the second part that will go live in episode 109. Mandy Chessell Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineering https://www.linkedin.com/in/mandy-chessell-a4989722/ ODPi Blog post on Egeria: First Release of ODPi Egeria is Here ODPi github projects: Egeria - Open Metadata and Governance https://github.com/odpi/egeria Data-governance companion project https://github.com/odpi/data-governance   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
09 Oct 2018Episode 109 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 200:52:10
In this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us. In this second part, we discuss the ins and outs of good data stewardship and how companies can adopt, implement and contribute. Mandy Chessell Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineering https://www.linkedin.com/in/mandy-chessell-a4989722/ ODPi Blog post on Egeria: First Release of ODPi Egeria is Here ODPi github projects: Egeria - Open Metadata and Governance https://github.com/odpi/egeria Data-governance companion project https://github.com/odpi/data-governance Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
02 Oct 2018Episode 108 – Roaring News00:55:57
Another episode of Big Data News and not just another episode, but an episode packed and packed with items. Before we do our regular article reviews, we are doing raffles for not one, not two but three different events! And as if that was not enough, our friends from Pulsar dropped in with their big Apache top-level project announcement. So not very bite sized this time, but smack full of delicious Big Data news! Breaking News Our thanks to our guests: Solix Empower   Sai Gundavelli Founder/CEO, Solix Technologies   Streamlio   Sanjeev Kulkarni Co-Founder at Streamlio     Sijie Guo Co-Founder at Streamlio   Free Big Data Event ticket giveaways: DataWorks Summit Asia Pacific Singapore Oct 11, 2018 - Tokyo Oct 16, 2018 - Melbourne Feb 06, 2018 To enter the raffle, send email to dws18apac@roaringelephant.org Tell us what event you want to attend! (Singapore, Tokyo, Melbourne) Solix Empower New York 2018 New York November 01, 2018 To enter the raffle, send email to SolixEmpower18@roaringelephant.org H2O AI World London  London October 29-30, 2018 To enter the raffle, send email to h2oLondon18@roaringelephant.org Please note that we are giving away discount codes that will give you access to the events for free. You still need to arrange your own travel and lodging! News articles: The Apache Software Foundation Announces Apache® Pulsar™ as a Top-Level Project https://blogs.apache.org/foundation/entry/the-apache-software-foundation-announces39 https://github.com/apache/pulsar Who wrote that anonymous NYT op-ed? Text similarity analyses with R http://blog.revolutionanalytics.com/2018/09/anonymous-nyt-op-ed.html Beyond Interactive: Notebook Innovation at Netflix https://medium.com/netflix-techblog/notebook-innovation-591ee3221233   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
16 Oct 2018Episode 110 – Roaring News00:38:23
Another week, another Big Data News episode. After going over all the event ticket giveaways that are currently going on, we have an article that goes over the basics on ETL vs ELT and have some fun with R graphs by the XKCD web comic. We finish with an in depth article on columnar data stores and a quick shout-out to Apache Nifi. Breaking News Our thanks to our guest from H2O.ai:   John Spooner Director of Solution Engineering, h2o.ai       Dave: XKCD Curve Fitting in R http://blog.revolutionanalytics.com/2018/09/curve-fitting.html Artificial intelligence, data will be the differentiator in the marketplace https://www.information-age.com/artificial-intelligence-data-123475102/ Jhon: Scaling ETL: How data pipelines evolve as your business grows https://bytes.grubhub.com/scaling-etl-how-data-pipelines-evolve-as-your-business-grows-72ff6c744e6e The design and implementation of modern column-oriented database systems https://blog.acolyer.org/2018/09/26/the-design-and-implementation-of-modern-column-oriented-database-systems/ Apache NiFi In Depth https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html?es_p=7695258 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us
23 Oct 2018Episode 111 – How Public Cloud changed Big Data00:51:08
No interview this time but just Dave and Jhon talking about how public cloud changed Big data. Current news has brought this topic back to the foreground and we though it was a good idea to give our views on this subject. Along the way, we go over the different deployment strategies for Hadoop across on premise, private and public cloud and of course, hybrid environments. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
30 Oct 2018Episode 112 – Roaring News00:26:37
In this last Big Data news episode for the month of November, we look forward to the H2O World event next week in London and we have articles on BI Maturity and the upcoming Apache Ozone project that will supplant HDFS in future Hadoop clusters soon(TM). BI Maturity: You can’t get there from here! http://makingdatameaningful.com/bi-maturity/ Introducing Apache Hadoop Ozone: An Object Store for Apache Hadoop https://hortonworks.com/blog/introducing-apache-hadoop-ozone-object-store-apache-hadoop/ Katacoda example down on this page https://hadoop.apache.org/ozone Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
06 Nov 2018Episode 113 – H2OAIWorld London 2018 Roaring Report01:02:13
Here is our H2O.ai World conference London Roaring Report. We had a blast and we hope that this episode can give you a good taste of what was going on. The sessions are now available online: https://www.youtube.com/playlist?list=PLNtMya54qvOHh9LaA08hkusynWVStNEhm Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
13 Nov 2018Episode 114 – Roaring News00:26:48
In this serving of bite-sized Big Data News we talk about the IBM takeover of Red Hat, a new Botnet going for unprotected Hadoop nodes and a somewhat disappointing Cloudera blog post. IBM To Acquire Red Hat https://investors.redhat.com/news-and-events/press-releases/2018/10-28-2018-184027500 https://newsroom.ibm.com/2018-10-28-IBM-To-Acquire-Red-Hat-Completely-Changing-The-Cloud-Landscape-And-Becoming-Worlds-1-Hybrid-Cloud-Provider New DDoS botnet goes after Hadoop enterprise servers https://www.zdnet.com/article/new-ddos-botnet-goes-after-hadoop-enterprise-servers/ (remember Dr.Who ? https://medium.com/@neerajsabharwal/hadoop-yarn-hack-9a72cc1328b6 ) New in Cloudera Enterprise 6: Apache Hive 2.1 (By the Cloudera Hive Team) http://blog.cloudera.com/blog/2018/10/new-in-cloudera-enterprise-6-apache-hive-2-1/ https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_601_unsupported_features.html#hive_c6_unsupported_features https://hive.apache.org/downloads.html https://issues.apache.org/jira/browse/HIVE-17129 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
20 Nov 2018Episode 115 – Anniversary three: I guess we’re in it for the long run now!00:59:40
It's been three years since we started this podcast and as we've done in previous years, we  invited the wonderful people that were a guest on our show in the past twelve months and made our little podcast so much better for our listeners! Our thanks to our guests that celebrated our three year anniversary with us: Ward Bekker (Linkedin) Pre-Sales Solutions Engineer II at Hortonworks Talking about Apache Metron   Rohit Jain (linkedin) Chief Technology Officer at Esgyn Talking about Esgyn, Trafodion and cloud vs on-premise vs hybrid.   Sanjeev Kulkarni (Linkedin) Co-Founder at Streamlio Talking about Apache Pulsar   Phillip Radley (Linkedin) Chief Data Architect at BT Talking about future predictions made years ago   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
27 Nov 2018Episode 116 – Roaring News00:27:09
This Machine Learning heavy  edition of Big Data News, covers Boston School Bus schedules and Model interpretation using LIME. As a bonus, we have a great source of Nifi knowledge for you! What the Boston School Bus Schedule can Teach US About AI https://www.wired.com/story/joi-ito-ai-and-bus-routes/ Understanding model predictions with LIME https://towardsdatascience.com/understanding-model-predictions-with-lime-a582fdff3a3b Introduction to Local Interpretable Model-Agnostic Explanations (LIME) https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime Locally Interpretable Models and Effects based on Supervised Partitioning (LIME-SUP) https://arxiv.org/abs/1806.00663 Best of NiFi https://pierrevillard.com/best-of-nifi/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
04 Dec 2018Episode 117 – Big Data Disaster Recovery00:53:21
When Big data projects mature from R&D projects to business critical components, it becomes important to look at how your environment can survive and recover from catastrophic failures. Considering the not unimportant cost of a good Disaster Recovery plan, it is good to take a good look at your deployment and carefully weigh the good and bad on a granular level. Here is the link to the slideshare presentation by Carlos Izquierdo at Big Data Spain 2017: Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
11 Dec 2018Episode 118 – Roaring News00:32:31
In this Big Data News episode, we use an article on how some disgruntled open source projects tried to force the "net giants" to give back as an excuse to talk about open source ethics. The second article for today comes from the hand of Noel Sharkey about possible deception in modern robotics. Time for Net Giants to Pay Fairly for the Open Source on Which They Depend https://www.linuxjournal.com/content/time-net-giants-pay-fairly-open-source-which-they-depend Mama Mia It's Sophia: A Show Robot Or Dangerous Platform To Mislead? https://www.forbes.com/sites/noelsharkey/2018/11/17/mama-mia-its-sophia-a-show-robot-or-dangerous-platform-to-mislead Artificial Intelligence: A Modern Approach (Third edition) by Stuart Russell and Peter Norvig http://aima.cs.berkeley.edu/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
18 Dec 2018Episode 119 – Knowage: The Open Source Business Analytics Suite00:48:43
This time we are joined by Paolo from Knowage who gives us a high level overview of Knowage: a totally open source suite for Business Analytics. The Knowage suite is composed of several modules, each one conceived for a specific analytical domain. They can be used individually or combined with one another to ensure full coverage of user’ requirements, allowing to build a tailored product.       Thank you to our guest: Paolo Raineri Business Developer (linkedin) https://www.knowage-suite.com   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
25 Dec 2018Episode 120 – Roaring News00:39:36
Merry Big Data News Christmas! Since it's the 25th of December, we're investigating how Big Data is changing the operations at the North Pole using a couple of blog posts from Splunk. Christmas 2020. Will big data and IOT change things for Father Christmas? Part I https://www.splunk.com/blog/2014/12/17/christmas-2020-part1.html   Christmas 2020. Will big data and IOT change things for Father Christmas? Part II https://www.splunk.com/blog/2014/12/18/christmas-2020-part2.html Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
01 Jan 2019Episode 121 – Infrastructure and Data Lifecycle (part 1)00:42:53
Does the standard Dev-Test-Prod cycle make sense in a Big Data environment or should you approach this subject a little differently? In this episode, we sum up our experiences and best practice tips regarding the infrastructure part and Data Lifecycle  will be features in the next topic episode. Planning on attending the Melbourne @DataWorksSummit? Send email to DWS18APAC@roaringelephant.org for a free ticket to the Melbourne event in February! Big thanks to @DataWorksSummit & @hortonworks for sponsoring this giveaway!   Dataworks Summit Barcelona is also rapidly approaching. You can find my dynamic sessions statistics dashboard here: https://aka.ms/DWS2019BA Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
08 Jan 2019Episode 122 – Roaring news00:32:38
In this first Big Data News episode of 2019, we cover how A.I. will nudge you to a happier (work)life, the new Hive Data Warehouse connector. We end the episode with unstable artificial intelligence and how you can make a chance on a one million Euro prize! Can an AI keep you happy at work? Ex-Google team reveal software that 'nudges' workers with messages throughout the day https://www.dailymail.co.uk/sciencetech/article-6545051/The-AI-happy-work-Ex-Google-team-reveal-software-nudges-workers.html https://humu.com/ Apache Hive Warehouse Connector Use-Cases https://hortonworks.com/blog/hive-warehouse-connector-use-cases/ https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html http://www.russellspitzer.com/2017/05/19/Spark-Sql-Thriftserver/ In January, the EU starts running Bug Bounties on Free and Open Source Software https://juliareda.eu/2018/12/eu-fossa-bug-bounties/ AI has a probability problem https://go.forrester.com/blogs/artificial-intelligence-has-a-probability-problem/ Apache Kafka     58.000,00 € 07/01/2019     15/08/2019 HackerOne https://www.zdnet.com/article/eu-to-fund-bug-bounty-programs-for-14-open-source-projects-starting-january-2019/ https://juliareda.eu/2016/07/eu-audits-keepass-apache/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
15 Jan 2019Episode 123 – Infrastructure and Data Lifecycle (part 2)00:57:24
In episode 121 we discussed the first part of this story and now we conclude with a discussion of the data life-cycle considerations that apply to a Big Data and Advanced Analytics environment. The primary inspiration for this episode: The Big Data Lifecycle explained https://www.pinkelephantasia.com/big-data-lifecycle/ Additional Inspiration: 7 phases of a data life cycle https://www.bloomberg.com/professional/blog/7-phases-of-a-data-life-cycle/ Thinking Beyond Traditional Data Life Cycle Management https://hortonworks.com/article/thinking-beyond-traditional-data-life-cycle-management/ Understanding the Big Data Life-Cycle https://www.linkedin.com/pulse/four-keys-big-data-life-cycle-kurt-cagle/   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
22 Jan 2019Episode 124 – Roaring News00:38:12
The Hortonworks -Cloudera merger has been finalized and the new CDP (Cloudera Data Platform) has been announced. We also talk about data mining bias, the good and bad of Hackathons and end on a rant about data sizes. Cloudera Unveils CDP, Talks Up ‘Enterprise Data Cloud’ https://www.datanami.com/2019/01/10/cloudera-unveils-cdp-talks-up-enterprise-data-cloud/?_lrsc=718d30ff-51ed-40c5-bba9-750a82009aaf Cloudera and Hortonworks' merger closes; quo vadis Big Data? https://www.zdnet.com/article/cloudera-and-hortonworks-merger-closes-quo-vadis-big-data/ Welcome to a brand-new Cloudera https://hortonworks.com/blog/welcome-brand-new-cloudera/ The Exaggerated Promise of So-Called Unbiased Data Mining https://www.wired.com/story/the-exaggerated-promise-of-data-mining/ On Hackathons : Lessons Learned, Experience, Advice https://www.knoyd.com/blog/2019/1/10/on-hackathons-lessons-learned-experience-advice Big Insights Not Big Data: Why We Should Stop Talking About File Size https://www.forbes.com/sites/kalevleetaru/2019/01/09/big-insights-not-big-data-why-we-should-stop-talking-about-file-size Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
29 Jan 2019Episode 125 – Sparkling Water with H2O.AI (Part 1)00:51:21
We recently sat down with Kuba and Pavel from H2O to discuss how you can easily lift your Spark notebooks to the next level by adding some H20 to it using their open source Sparkling Water project. In this first part of the interview, we cover the conceptual principles behind Sparkling water and discuss some existing use case implementations. Jakub "Kuba" Hava Senior Software Engineer at H2O.ai     Pavel Pscheidl Machine learning engineer at H2O.ai, Software engineer, Writer   H2O World San Fransisco Find out more at the upcoming H2O World conference in San Fransisco on February 4-5, 2019   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
12 Feb 2019Episode 127 – Sparkling Water with H2O.AI (part 2)00:40:10
We recently sat down with Kuba and Pavel from H2O to discuss how you can easily lift your Spark notebooks to the next level by adding some H20 to it using their open source Sparkling Water project. In this second part of the interview, we go deeper into the technical details of Sparking Water and how you can deploy and use it in your environment. We end the conversation with a look at the roadmap and anything else the future may bring. Jakub "Kuba" Hava Senior Software Engineer at H2O.ai     Pavel Pscheidl Machine learning engineer at H2O.ai, Software engineer, Writer   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
05 Feb 2019Episode 126 – Roaring News00:26:26
The second news episode for 2019 is almost entirely devoted to practical AI with some tutorial notebooks and finding a parking space. We end this show with dire warnings of the impending Big Data induced Apocalypse! Practical AI Workshop https://blog.revolutionanalytics.com/2019/01/notebooks-from-the-practical-ai-workshop.html Snagging Parking Spaces with Mask R-CNN and Python https://medium.com/@ageitgey/snagging-parking-spaces-with-mask-r-cnn-and-python-955f2231c400 Head of Russian Orthodox Church Warns Big Data Will Usher in the Antichrist https://gizmodo.com/head-of-russian-orthodox-church-warns-big-data-will-ush-1831598967   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
19 Feb 2019Episode 128 – Roaring News00:59:15
In this Deep learning heavy edition of Big Data News, we have articles about how to get into the Data Scientist life, how and where to get the skills and how you eventually may end up beating pro-gamers at their thing. [powerpress The DataWorks Summit Barcelona is coming up soon and we have a free entry ticket to raffle off to a lucky Big Data Winner! Send an email to DWS19BARCELONA at roaringelephant.org to enter the raffle! What’s Driving Data Science Hiring in 2019 https://www.datanami.com/2019/01/30/whats-driving-data-science-hiring-in-2019/ Practical Deep Learning for Coders 2019 https://www.fast.ai/2019/01/24/course-v3/ https://course.fast.ai/ Deep Learning vs Classical Machine Learning https://towardsdatascience.com/deep-learning-vs-classical-machine-learning-9a42c6d48aa Top Machine Learning Algorithms for Predictions. A Short Overview. https://www.aisoma.de/top-machine-learning-algorithms-for-predictions-a-short-overview/ An AI crushed two human pros at Starcraft but it wasn’t a fair fight https://arstechnica.com/gaming/2019/01/an-ai-crushed-two-human-pros-at-starcraft-but-it-wasnt-a-fair-fight/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
26 Feb 2019Episode 129 – DataWorks Summit Barcelona Track Chair Interviews00:43:33
In this episode we have interviews with Niels Basjes and Aljoscha Krettek, respectively track chairs for Big Compute & Storage and Internet of Things. We talk with them about what being a track lead means, the sessions in their tracks and of course about what they are doing themselves with Big Data and Advanced Analytics. Niels Basjes Lead IT-Architect Scalable Solutions at Bol.com     Bol.com Techlabs: https://techlab.bol.com/  https://techlab.bol.com/author/nbasjes/ Bol.com on Youtube: https://www.youtube.com/results?search_query=bol.com+berlinbuzzwords Bol.com is looking for you! https://careers.bol.com/   Aljoscha Krettek Co-Founder, Software Engineer at Data Artisans     Data Artisans / Ververica Blogs: https://www.ververica.com/blog Join a world-class team at Ververica: https://www.ververica.com/careers   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
05 Mar 2019Episode 130 – Roaring News00:35:06
In this episode of Bite Sized Big Data news, we cover the merging of Data Artisans and Alibaba forming the new Ververica entity, AI related challenges and a BBC cook book for visualizations in R. Dave had some issues recording his side, our apologies for the rather bad quality of Dave's audio track on this episode. Data Artisans, who was recently purchased by Alibaba, have renamed to Ververica. https://www.ververica.com/blog/introducing-our-new-name https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions The challenges to tackle before you start with AI http://www.ronaldvanloon.com/the-challenges-to-tackle-before-you-start-with-ai/ Create data visualisations like BBC news with the BBC R Cook Book https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and-data-journalism-team-works-with-graphics-in-r-ed0b35693535 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
12 Mar 2019Episode 131 – Dataworks Summit 2019 Barcelona Session Preview00:45:55
With the Dataworks summit in Barcelona comming up next week, we take a look at the agenda with the available sessions and take you through our best picks and honorable mentions. Session statistics dashboards: Dataworks Summit 2019 in Barcelona: https://aka.ms/DWS2019BA   Dataworks Summit 2018 Berlin: https://aka.ms/DWS2018 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
21 Mar 2019Episode 132 – Roaring DataWorks Summit Barcelona, ft. John Mertic01:18:55
Dataworks Summit 2019 Barcelona has come and gone... Recording live from my hotel room, we give our view on the highs and lows of the event and talk about the things we learned. This episode also include a short interview with John Mertic from the Linux Foundation who talked to us about Data Governance and ODPi Egeria. John Mertic Director of Program Management for ODPi, R Consortium, and Open Mainframe Project https://www.linkedin.com/in/jmertic/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
26 Mar 2019Episode 133 – Big Data in Cybersecurity with Saad Ayad, featuring Apache Metron (Part 1)00:32:35
DataLeaks and the resulting attack on our privacy have been a major news item in the recent months. Big data tools like Apache Metron, built on top of Hadoop can be instrumental in detecting and preventing intrusions. In this episode, we are joined by Saad Ayad who was General Manager Security Operations at Telstra and currently is a Director at Digital Fortress Services in Melbourne Australia. Saad has been active in the cybersecurity world for a long time and we are grateful he was willing to spend some time with us and share his knowledge and experience. [Digital Fortress Services - Cybersecurity] Saad Ayad (@saadayad_) Cyber Security, Big Data Analytics & Operations     http://www.digitalfortress.services @DigFortServ   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
09 Apr 2019Episode 135 – Big Data in Cybersecurity with Saad Ayad, featuring Apache Metron (Part 2)00:30:05
DataLeaks and the resulting attack on our privacy have been a major news item in the recent months. Big data tools like Apache Metron, built on top of Hadoop can be instrumental in detecting and preventing intrusions. In this episode, we are joined by Saad Ayad who was General Manager Security Operations at Telstra and currently is a Director at Digital Fortress Services in Melbourne Australia. Saad has been active in the cybersecurity world for a long time and we are grateful he was willing to spend some time with us and share his knowledge and experience. [Digital Fortress Services - Cybersecurity] Saad Ayad (@saadayad_) Cyber Security, Big Data Analytics & Operations     http://www.digitalfortress.services @DigFortServ   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
02 Apr 2019Episode 134 – Roaring News: Dataworks Summit Lightning Interviews00:37:19
A special edition of Big Data News featuring a number of quick interviews at the booths in the community expo hall. A big thank you to the brave people there that were willing to face the Roving Roaring Mike at the Barcelona Dataworks summit a couple, of weeks ago. 03:04 Attunity    https://www.attunity.com/   07:41 Cloudera Fast Forward Labs https://www.cloudera.com/products/fast-forward-labs-research.html    11:09 DataVard https://www.datavard.com   17:19 Cazena https://www.cazena.com/   22:39 Syncsort https://www.syncsort.com   26:22 Accenture https://www.accenture.com   30:44 Unravel Data https://unraveldata.com Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
16 Apr 2019Episode 136 – Temet Nosce00:31:04
Breaking with tradition, this News Episode does not have any Big data related articles. Instead, this episode is all about our plans for the future of this podcast...   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
23 Apr 2019Episode 137 – Interview on DataOps with Chris Bergh of DataKitchen.io (Part 1)00:45:06
DataKitchen.io's Chris Bergh takes us down the path towards successful DataOps implementation. If you have not heard of the DataOps concept yet and data is a big part of your environment (and really, it should be) we're sure you will find more than a couple takeaways here!   Christopher Bergh (@ChrisBergh) CEO & Head Chef, DataKitchen     The DataOps Cookbook DataOps is NOT Just DevOps for Data Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
07 May 2019Episode 139 – Interview on DataOps with Chris Bergh of DataKitchen.io (Part 2)00:33:41
DataKitchen.io's Chris Bergh takes us down the path towards successful DataOps implementation. If you have not heard of the DataOps concept yet and data is a big part of your environment (and really, it should be) we're sure you will find more than a couple takeaways here!   Christopher Bergh (@ChrisBergh) CEO & Head Chef, DataKitchen       The DataOps Cookbook DataOps is NOT Just DevOps for Data Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
30 Apr 2019Episode 138 – Roaring News00:27:50
The biggest news is of course the launch of our Patreon! Hop over to https://www.patreon.com/roaringelephant and see if you want to help us thrive and grow! On the technical front, we have a Blog on Machine Learning Model Management, Apache turning 20 and Google breeding aggressive A.I.! And we also have a side-conversation on NginX... Apache Software Foundation Continues to Grow Open Source Software https://www.eweek.com/development/the-apache-software-foundation-continues-to-grow-open-source-software Frameworks for Machine Learning Model Management https://www.inovex.de/blog/machine-learning-model-management/ Google's AI Has Learned to Become "Highly Aggressive" in Stressful Situations https://www.sciencealert.com/google-deep-mind-has-learned-to-become-highly-aggressive-in-stressful-situations Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
14 May 2019Episode 140 – Roaring News00:36:58
Another week another feed of roaring news articles starting with apparent changes at MapR and the release of Red Hat Enterprise Linux 8. We go in depth on the open sourcing of the DataBricks developed Delta Lake and finish with some SQL generated fractals. Big thanks to our Roaring Patreons making this podcast possible! DataWorks Summit free ticket raffle. Final week for our DataWorksSummit Washington  DC free ticket giveaway! Get your free ticket now! The Roaring Elephant on YouTube. The Roaring Elephant YouTube channel has launched! Will you help us reach 100 subscribers (modest goals are a good start!) so we can claim our personalized URL on YouTube? Every time a new episode is published, you will find a video uploaded to the channel as well. There won't be any real video yet though, only a still image as you can see in the thumbnails. But as soon as we reach the related goal on our Patreon, this is where our video content will appear. In case you are wondering, when we start recording actual video's, the regular mp3's on the podcast feed will remain exactly as they are now. So if you prefer not to look at our mugs while enjoying the podcast, that should remain possible. Interactive DWS-DC session dashboard https://aka.ms/DWS2019DC As I've been doing for a while now, I've again launched a session statistics dashboard for this event. It can be found at https://aka.ms/DWS2019DC and as usual, this PowerBi dashboard is interactive. simply click on the different elements to filter or drill down. There's only 58 sessions listed at the moment. I will be updating it from time to time so keep an eye out for some tweets from @jhonmasschelein if you want to get notified! R.I.P. MapR? https://www.linkedin.com/feed/update/urn:li:activity:6532418505361416192 https://www.linkedin.com/feed/update/urn:li:activity:6532352941800595456 Our first bit of news is more of a rumor for now: we were pointed towards some messages on LinkedIn that seem to indicate some reorganising is happening there: We will be following how this develops in the next few weeks. Best of luck to anyone who is affected! RHEL version 8 is out! Red Hat Opens the Linux Experience to Every Enterprise, Every Cloud and Every Workload with Red Hat Enterprise Linux 8 It's been a while coming but even though RHEL 7 is still around for a few years, Red Hat has released the next version of their popular Linus distro.Notwithstanding Dave's horror at the new logo, we're very exited about this and personally, I am eagerly awaiting the Centos 8 release that should appear in a couple of months Delta lake Open-Sourced. Open Sourcing Delta Lake Databricks claims its new product Delta is the missing link to enterprise AI A press release from the good folks at DataBricks informs the world that their proprietary data lake storage layer called "Delta Lake" has now been open sourced. Delta Lake was released by DataBricks at the end of 2017 and was only available on their managed Service offerings in the public clouds, but now anyone can download and deploy. However, all is not well: we're having some serious issues with the content of the press release and quite frankly, we're scratching our heads to find exactly what problem Delta Lake is trying to solve and if it actually does that... Fractals, SQL-Style! Generating Fractals with Postgres: Escape-Time Fractals Just to make Dave happy, we finish this episode off with some great fractal visualizations made with SQL. Euch... What? Yes, SQL. That's right! Click the link to see how the apparently Turing Complete SQL is able to do that. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
28 May 2019Episode 142 – Roaring News – KubeCon 2019 Report00:47:07
A little over a week ago, KubeCon and CloudNativeCon happened and our independent Roaring Roving Reporter Rubik Dave came back from Barcelona with a comprehensive report. Kubernetes As the kubernetes.io webpage tells us: "Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications." As we discuss in the episode, Kubernetes forms a kind of middleware layer that performs orchestration of light weight docker containers. To be sure, you can use other container technologies but Docker (and its companion project Moby) are what is most often used with Kubernetes. The biggest advantage of Kubernetes, I believe, is how it has standardized the way a micro services framework based on docker container instances can be deployed and managed. There have been a myriad of other approaches that tried to solve that problem (and Dave gives a rather exhaustive list in the episode), Kubernetes has emerged to be the best supported by the community. KubeCon And that is where KubeCon comes in: there are other, more developer oriented conferences, but KubeCon is perhaps the largest event for Kubernetes consumers.  Details on this years event are available at the KubeCon | CloudNativeCon Europe 2019 website. If you missed this years installment, take a note that next years Europe event will be in Amsterdam, March 30th to April 2nd. And if the American continent is more practical, you can join the community at the San Diego venue, November 18th to 21st. CloudNativeCon KubeCon ran together with the CloudNativeCon for as long as I can figure out and since Kubernetes is one of the larger "CNCF graduated" projects, that is not surprising. It also makes sense since micro services architectures are an excellent fit for cloud based deployments so a lot of the Kubernetes community is likely to also be a member of the "cloud crowd". Now, reading the CloudNative website, their charter in particular, it does seems to see it's purpose in a similar vein as the Apache Foundation does. However, the CloudNative folk recommend the projects under it's wings to use the Apache 2.0 license so they certainly don't appear to be in any kind of direct competition here... I think I feel a future podcast episode announcing itself! :D Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
21 May 2019Episode 141 – Spark in Action with author Jean-Georges Perrin (Part 1)00:48:02
And now for something completely different: a book review! Not something we have done before, but when Jean-Georges Perrin contacted us with the suggestion of taking a deeper look at the "Spark in Action" book he is currently writing, we certainly did not say no! However, in al honesty, we talked about much, much more... Free eBook raffle Manning Publication has been kind enough to give us a couple of download codes for a free eBook version of "Spark in Action". As always, our Patreons get a first chance to get their hands on one of the codes. If you are a Roaring V.I.P. (or higher), you can head over to our Patreon Page now where you will find a  posts containing all the information required. If you become a Patreon now, you immediately get access tot that post! ;) After one week, if there are any codes left, there will be a tweet about what you can do to get a free code, even if you are not a Patreon. A book review on Spark in Action, second edition with author Jean-Georges Perrin In this first part of the interview, we meet the author and talk about Apache Spark and Open Source in general. We also cover the MEAP system used by Manning Publication to get books like these in the hands of the readers as soon as possible while allowing early readers to help shape the book. Our thanks to Jean-Georges for spending quite a bit of time with us talking about Apache Spark and to Manning Publication for the free eBook codes! Find out more about Jean-Georges at his blog: https://jgp.net/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
04 Jun 2019Episode 143 – Spark in Action with author Jean-Georges Perrin (Part 2)00:58:56
And now for something completely different: a book review! Not something we have done before, but when Jean-Georges Perrin contacted us with the suggestion of taking a deeper look at the "Spark in Action" book he is currently writing, we certainly did not say no! However, in al honesty, we talked about much, much more... Free eBook raffle Manning Publication has been kind enough to give us a couple of download codes for a free eBook version of "Spark in Action". As always, our Patreons get a first chance to get their hands on one of the codes. If you are a Roaring V.I.P. (or higher), you can head over to our Patreon Page now where you will find a  posts containing all the information required. If you become a Patreon now, you immediately get access tot that post! ;) After one week, if there are any codes left, there will be a tweet about what you can do to get a free code, even if you are not a Patreon. A book review on Spark in Action, second edition with author Jean-Georges Perrin In the second part we go deeper into the book, going over the available chapters and appendices. We cover a number of topics and concepts like the layout of a typical data lake, the four pillars of Apache Spark and more. We end the interview with a discussion on what it's like to write a technical book like Spark in Action. Our thanks to Jean-Georges for spending quite a bit of time with us talking about Apache Spark and to Manning Publication for the free eBook codes! Find out more about Jean-Georges at his blog: https://jgp.net/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Améliorez votre compréhension de Roaring Elephant avec My Podcast Data

Chez My Podcast Data, nous nous efforçons de fournir des analyses approfondies et basées sur des données tangibles. Que vous soyez auditeur passionné, créateur de podcast ou un annonceur, les statistiques et analyses détaillées que nous proposons peuvent vous aider à mieux comprendre les performances et les tendances de Roaring Elephant. De la fréquence des épisodes aux liens partagés en passant par la santé des flux RSS, notre objectif est de vous fournir les connaissances dont vous avez besoin pour vous tenir à jour. Explorez plus d'émissions et découvrez les données qui font avancer l'industrie du podcast.
© My Podcast Data