Dell and Sputnik go to OSCON

July 18, 2013

Next week, myself, Michael Cote and a whole other bunch of Dell folk will be heading out to Portland for the 15th annual OSCON-ana-polooza.  We will have two talks that you might want to check out:

Cote and I will be giving the first and the second will be lead by Joseph George and James Urquhart.

Sputnik Shirt

And speaking of Project Sputnik, we will be giving away three of our XPS 13 developer editions:  one as a door prize at the OpenStack birthday party, one as a drawing at our booth and one to be given away at James and Joseph’s talk listed above.

We will also have a limited amount of the shirt to the right so stop by the booth.

But wait, there’s more….

To learn firsthand about Dell’s open source solutions be sure to swing by booth #719 where we will have experts on hand to talk to you about our wide array of solutions:

  • OpenStack cloud solutions
  • Hadoop big data solutions
  • Crowbar
  • Project Sputnik (the client to cloud developer platform)
  • Dell Multi-Cloud Manager (the platform formerly known as “Enstratius”)
  • Hyperscale computing systems

Hope to see you there.

Pau for now…


Time Lapse: Building Dell’s Big Data/OpenStack MDC — allowing customers to test at hyper scale

April 1, 2013

Back in September I posted an entry about the Modular Data Center that we set up in the Dell parking lot.  Here is a time lapse video showing the MDC and the location being built out.

The MDC allows customers to test solutions at scale.  It is running OpenStack and various Big Data goodies such as Hadoop, Hbase, Cassandra, MongoDB, Gluster etc…

Customers can tap into the MDC from Dell’s solution centers around the world and do proof of concepts as well competitive bake-offs between various big data technologies so they can determine which might best suit their environment and use case.

Extra-credit reading


MDC in our parking lot, serving up OpenStack & Hadoop

September 11, 2012

Why use valuable internal real estate when you can just set up a Modular Data Center (MDC) in your parking lot?  The point wasn’t lost on the Dell Solution Center team who, with help from our partners Intel, is doing just that here in Round Rock.

The new MDC, which should be online in a few weeks, will host Dell’s OpenStack-Powered Cloud and Apache Hadoop solutions for customers to test drive and build POCs in Dell Solution Centers around the world.

Here’s the MDC being lowered into place yesterday.

Here are some pics I snapped this morning when I went down to get my coffee. (double click on them to see them full sized)

Extra-credit reading

Pau for now…


Dell’s Big Data escalator pitch

February 24, 2012

At our sales kickoff in Vegas, Rob Hirschfeld chose a unique vehicle to succinctly convey our Big Data story here at Dell.  Check out the video below to hear one of our chief software architects for our Big Data and OpenStack solutions explain, in less than 90 seconds, what we are up to in the space and the value it brings customers.

Extra credit reading

Pau for now…


Web Glossary part two: Data tier

January 18, 2012

Here is part two of three of the Web glossary I complied.  As I mentioned in my last two entries, in compiling this I pulled information from various and sundry sources across the Web including wikipedia, community and company web sites and the brain of Cote.

Enjoy

General terms

  • Structured data: Data that can be organized in a structure e.g. rows or columns so that it is identifiable. The most universal form of structured data is a database like SQL or Access.
  • Unstructured data:  Data that has no identifiable structure. Unstructured data typically includes bitmap images/objects, text and other data types that are not part of a database. Most enterprise data today can actually be considered unstructured. An email is considered unstructured data.
  • Big Data: Data characterized by one or more of the following characteristics:  Volume – A large amount of data, growing at large rates; Velocity – The speed at which the data must be processed and a decision made;  Variety – The range of data, types and structure to the data
  • Relational Databases (RDBMS) Management Systems: These databases are the incumbents in enterprises today and store data in rows and columns.  They are created using a special computer language, structured query language (SQL), that is the standard for database interoperability.  Examples:  IBM DB2, MySQL, Microsoft SQL Server, PostgreSQL, Oracle RDBMS, Informix, Oracle Rdb, etc.
  • NoSQL: refers to a class of databases that 1) are intended to perform at internet (Facebook, Twitter, LinkedIn) scale and 2) reject the relational model in favor of other (key-value, document, graph) models.  They often achieve performance by having far fewer features than SQL databases and focus on a subset of use cases.  Examples: Cassandra, Hadoop, MongoDB, Riak
  • Recommendation engine:  A recommendation engine takes a collection of frequent itemsets as input and generates a recommendation set for a user by matching the current user’s activity against the discovered patterns. The recommendation engine is on-line process, therefore its efficiency and scalability are key,  e.g. people who bought X often also bought Y.
  • Geo-spatial targeting: the practice of mapping advertising, offers and information based on geo location.
  • Behavioral targeting: a technique used by online publishers and advertisers to increase the effectiveness of their campaigns.  Behavioral targeting uses information collected on an individual’s web-browsing behavior, such as the pages they have visited or the searches they have made, to select which advertisements to display to that individual.
  • Clickstream analysis: On a Web site, clickstream analysis is the process of collecting, analyzing, and reporting aggregate data about which pages visitors visit in what order – which are the result of the succession of mouse clicks each visitor makes (that is, the clickstream). There are two levels of clickstream analysis, traffic analysis and e-commerce analysis.

Projects/Entities

  • Gluster: a software company acquired by Red Hat that provides an open source platform for scale-out Public and Private Cloud Storage.
  • Relational Databases
    • MySQL:  the most popular open source RDBMS.  It represents the “M” in the LAMP stack.  It is now owned by Oracle.
    • Drizzle:  A version of MySQL that is specifically targeted the cloud.  It is currently an open source project without a commercial entity behind it.
    • Percona:  A MySQL support and consulting company that also supports Drizzle.
    • PostgreSQL: aka Postgres is is an object-relational database management system (ORDBMS) available for many platforms including Linux, FreeBSD, Solaris, Windows and Mac OS X.
    • Oracle DB – not used so much in new WebTech companies, but still a major database in the development world.
    • SQL Server – Microsoft’ s RDBMS

    NoSQL Databases

    • MongoDB:  an open source, high-performance, database written in C++.  Many Linux distros include a MongoDB package, including CentOS, Fedora, Debian, Ubuntu and Gentoo.  Prominent users include Disney interactive media group, New York Times, foursquare, bit.ly, Etsy. 10gen is the commercial backer of MongoDB.
    • Riak: a NoSQL database/datastore written in Erlang from the company Basho. Originally used for the Content Delivery Network Akamai.
    • Couchbase: formed from the merger of CouchOne and Membase.  It offers Couchbase server powered by Apache CouchDB and is available in both Enterprise and Community editions. The author of CouchDB was a prominent Lotus Notes architect.
    • Cassandra: A scalable NoSQL database with no single points of failure.   A high-scale, key/value database originating from Facebook to handle their message inboxes. Backed by DataStax, which came out of Rackspace.
    • Mahout: A Scalable machine learning and data mining library. An analytics engine for doing machine learning (e.g., recommendation engines and scenarios where you want to infer relationships).
  • Hadoop ecosystem
    • Hadoop: An open source platform, developed at Yahoo that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.  It is particularly suited to large volumes of unstructured data such as Facebook comments and Twitter tweets, email and instant messages, and security and application logs.
    • MapReduce: a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner.  Hadoop acts as a platform for executing MapReduce.  MapReduce came out of Google
    • HDFS: Hadoop’s Distributed File system allows large application workloads to be broken into smaller data blocks that are replicated and distributed across a cluster of commodity hardware for faster processing.
  • Major Hadoop utilities:
    • HBase: The Hadoop database that supports structured data storage for large tables.   It provides real time read/write access to your big data.
    • Hive:  A data warehousing solution built on top of Hadoop.  An Apache project
    • Pig: A platform for analyzing large data that leverages parallel computation.  An Apache project
    • ZooKeeper:  Allows Hadoop administrators to track and coordinate distributed applications.  An Apache project
    • Oozie: a workflow engine for Hadoop
    • Flume: a service designed to collect data and put it into your  Hadoop environment
    • Whirr: a set of libraries for running cloud services.  It’s ideal for running temporary Hadoop clusters to carry out a proof of concept, or to run a few one-time jobs.
    • Sqoop: a tool designed to transfer data between Hadoop and relational databases.  An Apache project
    • Hue: a browser-based desktop interface for interacting with Hadoop
  • Cloudera: a company that provides a Hadoop distribution similar to the way Red Hat provides a Linux distribution.  Dell is using Cloudera’s distribution of Hadoop for its Hadoop solution.
  • Solr: an open source enterprise search platform from the Apache Lucene project. Backed by the commercial company Lucid Imagination.
  • Elastic Search: an open source, distributed, search engine built on top of Lucene (raw search middleware).

Extra-credit reading

Pau for now…


Hadoop World: What Dell is up to with Big Data, Open Source and Developers

December 18, 2011

Besides interviewing a bunch of people at Hadoop World, I also got a chance to sit on the other side of the camera.  On the first day of the conference I got a slot on SiliconANGLE’s the Cube and was interviewed by Dave Vellante, co-founder of Wikibon and John Furrier, founder of SiliconANGLE.

-> Check out the video here.

Some of the ground we cover

  • How Dell got into the cloud/scale-out arena and how that lead us to Big Data
  • (2:08) The details behind the Dell|Cloudera solution for Apache Hadoop and our “secret sauce,” project crowbar.
  • (4:00) Dell’s involvement in and affinity for open source software
  • (5:31) Dell’s interest in and strategy around courting developers
  • (7:35) Dell’s strategy of Make, Partner or Buy in the cloud space
  • (11:10) How real is OpenStack and how is it evolving.

Extra-credit reading

Pau for now…


How to create a Basic or Advanced Crowbar build for Hadoop

November 29, 2011

As I mentioned in my previous entry, the code for the Hadoop barclamps is now available at our github repo.

To help you through the process, Crowbar lead architect Rob Hirschfeld has put together the two videos below.  The first, Crowbar Build (on cloud server), shows you how to use a cloud server to create a Crowbar ISO using the standard build process.  The second,  Advanced Crowbar Build (local) shows how to build a Crowbar v1.2 ISO using advanced techniques on a local desktop using a virtual machine.

Crowbar Build (on cloud server)

Advanced Crowbar Build (local)

Pau for now…


Open source Crowbar code now available for Hadoop

November 29, 2011

Earlier this month we announced that Dell would be open sourcing the Crowbar “barclamps” for Hadoop.  Well today is the day and the code is now available at our github repo.

Whats a Crowbar barclamp?

If you haven’t heard of project Crowbar it’s a software framework developed at Dell that started out as an installation tool for OpenStack.  As the project grew beyond installation to include monitoring capabilities, network discovery, performance data gathering etc., the developers behind it, Rob Hirschfeld and Greg Althaus, decided to rewrite it to allow modules to plug into the basic Crowbar functionality.  These modules or “barclamps” allow the framework to be used by a variety of projects.  Besides the OpenStack and Hadoop barclamps written by Dell, VMware created a Cloud Foundry barclamp and DreamHost created a Ceph barclamp.

To help you get your bearings

As I mentioned in the opening  paragraph, the code for the Hadoop barclamp is now available.  To help you get started, below are a couple of videos that Rob put together.  The first walks you through how to install Crowbar and the second one explains how to use Crowbar to deploy Hadoop.

Extra-credit reading

Pau for  now…


Hadoop World: Learning about NoSQL database Couchbase

November 10, 2011

The next in my series of video interviews from Hadoop World is with Mark Azad who covers technical solutions for Couchbase.  If you’re not familiar with Couchbase it’s a NoSQL database provider and the company was formed when, earlier this year, CouchOne and Membase merged.

Here’s what Mark had to say.

Some of the ground Mark covers

  • What is Couchbase and what is NoSQL
  • How Couchbase works with Hadoop
  • What its product line up looks like and his new combined offering coming next year
  • Some of Couchbase’s customers and how Zynga uses them
  • What excites Mark the most up the upcoming year in Big Data

Extra-credit reading

Pau for now…


Hadoop World: O’Reilly Strata conference chair, Ed Dumbill

November 10, 2011

Yesterday, Hadoop World 2011 wrapped here in New York.  During the event I was able to catch up with a bunch of folks representing a wide variety of members of the ecosystem.  On the first day I caught up with Ed Dumbill of O’Reilly Media who writes about big data for O’Reilly Radar and also is the GM for O’Reilly’s big data conference, Strata.

Here’s what Ed had to say.

Some of the ground Ed covers

  • What is Strata and what does it cover
  • How will this years conference differ from last
  • Which customer types are making the best use of Hadoop, will Strata verticalize going forward
  • What is Ed looking forward to most in the upcoming Strata.

Extra-credit reading

Pau for now…


Developers: How to get involved with Crowbar for Hadoop

November 8, 2011

In the previous entry I mentioned that we have developed and will be opensourcing “barclamps” (modules that sit on top of Crowbar) for: Cloudera CDH/Enterprise, Zookeeper, Pig, Hbase, Flume and Sqoop.  All these modules will speed and ease the deployment, configuration and operation of Hadoop clusters.

If you would like to get involved, check out this 1 min video from Rob Hirschfeld talking about how:

Look for the code on the Crowbar GitHub repo by the last week of November.

Extra-credit reading:

Pau for now…


Dell to opensource software to ease Hadoop install & management

November 8, 2011

It wouldn’t be surprising if you were surprised to learn that Dell is developing software.  To say that this is an area we haven’t been known for in the past would be an understatement.  While we may not pose a direct threat to Microsoft any time soon, we have been coding in a few focused areas.  One of those areas is cloud installation and management and is represented by our project Crowbar.  While Crowbar began life simply as a way to install Openstack on Dell hardware, it has expanded from there.

Today’s news is that we have developed and will be opensourcing “barclamps” (modules that sit on top of crowbar) for: Cloudera CDH/Enterprise, Zookeeper, Pig, Hbase, Flume and Sqoop.  All these modules will speed and ease the deployment, configuration and operation of Hadoop clusters.  But don’t take my word for it.  Take a listen to Crowbar’s architect Rob Hirschfeld as he explains Crowbar and today’s announcement:

Look for the code on Crowbar GitHub repo by the last week of November.  If you want to get involved, learn how.

Extra-credit reading:

Pau for now…


Crowbar: Where its been and where its going

October 24, 2011

Rob Hirschfeld, aka “Commander Crowbar,” recently posted a blog entry looking back at how Crowbar came to be, how its grown and where he hopes it will go from here.

What’s a Crowbar?

If you’re not familiar with Crowbar, its an open source software framework that began life as an installation tool to speed installation of OpenStack on Dell hardware.  The project incorporates the Opscode Chef Server tool and was originally created here at Dell by Rob and Greg Althaus.  Just four short months ago at OSCON 2011 the project took a big step forward when, along with the announcement of our OpenStack solution, we announced that we were opensourcing it.

DevOps-ilicous

As Rob points out in his blog, as we were delivering Crowbar as an installer a collective light bulb went off and we realized the role that Chef and tools like it play in a larger movement taking place in many Web shops today: the movement of DevOps.

The DevOps approach to deployment builds up systems in a layered model rather than using packaged images…Crowbar’s use of a DevOps layered deployment model provides flexibility for BOTH modularized and integrated cloud deployments.

On beyond installation and OpenStack

As the team began working more with Crowbar, it occurred to them that its use could be expanded in two ways: it could be used to do more than installation and it could be expanded to work with projects beyond OpenStack.

As for functionality, Crowbar now not only installs and configures but once the initial deployment is complete, Crowbar can be used to maintain, expand, and architect the instance, including BIOS configuration, network discovery, status monitoring, performance data gathering, and alerting.

The first project beyond OpenStack that we used Crowbar on was Hadoop.  In order to expand Crowbar’s usage we created the concept of  “barclamps” which are in essence modules that sit on top of the basic Crowbar functionality.  After we created the Hadoop barclamp, others picked up the charge and VMware created a Cloud Foundry barclamp and DreamHost created a Ceph barclamp.

It takes a community

Crowbar development has recently been moved out into the open.  As Rob explains,

This change was reflected in our work on OpenStack Diablo (+ Keystone and Dashboard) with contributions by Opscode and Rackspace Cloud Builders.  Rather than work internally and push updates at milestones, we are now coding directly from the Crowbar repositories on Github.

So what are you waiting for?  Join our mailing list, download the code or ISO, create a barclamp, make your voice heard.  Who’s next?

Extra-credit reading:

Pau for now…


Big Data is the new Cloud

October 12, 2011

Big Data represents the next not-completely-understood got-to-have strategy.  This first dawned on me about a year ago and has continued to become clearer as the phenomenon has gained momentum.  Contributing to Big Data-mania is Hadoop, today’s weapon of choice in the taming and harnessing of  mountains of unstructured data, a project that has its own immense gravitational pull of celebrity.

So what

But what is the value of slogging through these mountains of data?  In a recent Forrester blog, Brian Hopkins lays it out very simply:

We estimate that firms effectively utilize less than 5% of available data. Why so little? The rest is simply too expensive to deal with. Big data is new because it lets firms affordably dip into that other 95%. If two companies use data with the same effectiveness but one can handle 15% of available data and one is stuck at 5%, who do you think will win?

The only problem is that while unstructured data (email, clickstream data, photos, web logs, etc.) makes up the vast majority of today’s data, the majority of the incumbent data solutions aren’t designed to handle it.    So what do you do?

Deal with it

Hadoop, which I mentioned above, is your first line of offense when attacking big data.  Hadoop is an open source highly scalable compute and storage platform.  It can be used to collect, tidy up and store boatloads of structure and unstructured data.  In the case of enterprises it can be combined with a data warehouse and then linked to analytics (in the case web companies they forgo the warehouse).

And speaking of web companies Hopkins explains

Google, Yahoo, and Facebook used big data to deal with web scale search, content relevance, and social connections, and we see what happened to those markets. If you are not thinking about how to leverage big data to get the value from the other 95%, your competition is.

So will Big Data truly displace Cloud as the current must-have buzz-tastic phenomenon in IT?  I’m thinking in many circles it will.  While less of a tectonic shift, Big Data’s more “modest” goals and concrete application make it easier to draw a direct line between effort and business return.  This in turn will drive greater interest, tire kicking and then implementation.  But I wouldn’t kick the tires for too long for as the web players have learned, Big Data is a mountain of straw just waiting to be spun into gold.

Extra-credit reading:

Pau for now…


Props from GigaOm for Dell as Web outfitter

September 26, 2011

Dell has been working for the last four plus years outfitting the biggest of the big web superstars like Facebook and Microsoft Azure with infrastructure.   More recently we have been layering software  such as Hadoop, OpenStack and crowbar on top of  that infrastructure.  This has not gone unnoticed by web pub GigaOm:

Want to become the next Amazon Web Services or Facebook? Dell could have sold you the hardware all along, but now it has the software to make those servers and storage systems really hum.

They also made the following observation:

Because [Dell] doesn’t have a legacy [software] business to defend, it can blaze a completely new trail that has its trailhead where Oracle, IBM and HP leave off.

Letting customers focus on what matters most

Its a pretty exciting time to be at Dell as we continue to move up the stack outfitting web players big and small.  The idea is to get these players established and growing in an agile and elastic way so they can concentrate on serving customers rather than building out their underpinning software and systems.

Stay tuned for more!

Extra-credit reading

Pau for now…


Now available: Dell | Cloudera solution for Apache Hadoop

September 12, 2011

A few weeks ago we announced that Dell, with a little help from Cloudera, was delivering a complete Apache Hadoop solution.  Well as of last week its now officially available!

As a refresher:

The solution is comprised of Cloudera’s distribution of Hadoop, running on optimized Dell PowerEdge C2100 servers with Dell PowerConnect 6248 switch, delivered with joint service and support from both companies.  You can buy it either pre-integrated and good-to-go or you can take the DIY route and set up yourself with the help of

Learn more at the Dell | Cloudera page.

Extra-credit reading

Pau for now…


Does Hadoop compete with or complement the data warehouse?

August 12, 2011

Dell’s chief architect for big data, Aurelian Dumitru (aka. A.D.) presented a talk at OSCON the week before last with the heady title, “Hadoop – Enterprise Data Warehouse Data Flow Analysis and Optimization.”  The session, which was well attended, explored the integration between Hadoop and the Enterprise Data Warehouse.  AD posted a fairly detailed overview of his session on his blog but if you want a great high level summary, check this out:

Some of the ground AD covers

  • Mapping out the data life cycle: Generate -> Capture -> Store -> Analyze ->Present
  • Where does Hadoop play and where does the data warehouse?  Where do they overlap?
  • Where do BI tools fit into the equation?
  • To learn more, check out dell.com/hadoop

Extra-credit reading


Introducing the Dell | Cloudera solution for Apache Hadoop — Harnessing the power of big data

August 4, 2011

Data continues to grow at an exponential rate and no place is this more obvious than in the Web space.  Not only is the amount exploding but so is the form data’s taking whether that’s transactional, documents, IT/OT, images, audio, text, video etc.   Additionally much of this new data is unstructured/ semi-structured which traditional relational databases were not built to deal with.

Enter Hadoop, an Apache open source project which, when combined with Map Reduce allows the analysis of entire data sets, rather than sample sizes, of structured and unstructured data types.  Hadoop lets you chomp thru mountains of data faster and get to insights that drive business advantage quicker.   It can provide near “real-time” data analytics for click-stream data, location data, logs, rich data, marketing analytics, image processing, social media association, text processing etc.  More specifically, Hadoop is particularly suited for applications such as:

  • Search Quality — search attempts vs. structured data analysis; pattern recognition
  • Recommendation engine — batch processing; filtering and prediction (ie use information to predict what similar users like)
  • Ad-targeting – batch processing; linear scalability
  • Thread analysis for spam fighting and detecting click fraud —  batch processing of huge datasets; pattern recognition
  • Data “sandbox” – “dump” all data in Hadoop; batch processing (ie analysis, filtering, aggregations etc); pattern recognition

The Dell | Cloudera solution

Although Hadoop is a very powerful tool, it can be a bit daunting to implement and use.  This fact wasn’t lost on the founders of Cloudera who set up the company to make Hadoop easier to used by packaging it and offering support.   Dell has joined with this Hadoop pioneer to provide the industry’s first complete Hadoop Solution (aptly named “the Dell | Cloudera solution for Apache Hadoop”).

The solution is comprised of Cloudera’s distribution of Hadoop, running on optimized Dell PowerEdge C2100 servers with Dell PowerConnect 6248 switch, delivered with joint service and support. Dell offers two flavors of this big data solution: Cloudera’s distribution with the free download of Hadoop software, and Cloudera’s enterprise version of Hadoop that comes with a charge.

It comes with its own “crowbar” and DIY option

The Dell | Cloudera solution for Apache Hadoop also comes with Crowbar, the recently open-sourced Dell-developed software, which provides the necessary tools and automation to manage the complete lifecycle of Hadoop environments.  Crowbar manages the Hadoop deployment from the initial server boot to the configuration of the main Hadoop components allowing users to complete bare metal deployment of multi-node Hadoop environments in a matter of hours, as opposed to days. Once the initial deployment is complete, Crowbar can be used to maintain, expand, and architect a complete data analytics solution, including BIOS configuration, network discovery, status monitoring, performance data gathering, and alerting.

The solution also comes with a reference architecture and deployment guide, so you can assemble it yourself, or Dell can build and deploy the solution for you, including rack and stack, delivery and implementation.

Some of the coverage (added Aug 12)

Extra-credit reading

 

Pau for now…


OSCON: How foursquare uses MongoDB to manage its data

July 27, 2011

I saw a great talk today here at OSCON Data up in Portland, Oregon.  The talk was Practical Data Storage: MongoDB @ foursquare and was given by foursquare‘s head of server engineering, Harry Heymann.  The talk was particularly impressive since, due to AV issues, Harry had to wing it and go slideless.  (He did post his slides to twitter so folks with access could follow along).

After the talk I grabbed a few minutes with Harry and did the following interview:

Some of the ground Harry covers

  • What is foursquare and how it feeds your data back to you
  • “Software is eating the world”
  • How foursquare  got to MongoDB from MySQL
  • Handling 3400% growth
  • How they use Hadoop for offline data
  • Running on Amazon EC2 and at what point does it make sense to move to their own servers

Extra-credit reading

Pau for now…


Hadoop Summit: Looking at the evolving ecosystem with Ken Krugler

July 13, 2011

Here is the final entry in my interview series from the Hadoop Summit.

The night before the summit, I was impressed when I heard Ken Krugler speak at the BigDataCamp unconference.  Turns out Ken has been a part of the Hadoop scene even before there was a Hadoop, his 2005 start-up Krugle utilized Nutch which split and evolved into Hadoop.  He now runs a Hadoop consulting practice, Bixo labs, and offers training.

I ran into Ken the next day at the summit and sat down with him to get his thoughts on Hadoop and the ecosystem around it.

Some of the ground Ken covers

  • How he first began using Hadoop many moons ago
  • (0:53)  How Hadoop has crossed the chasm over the last half decade
  • (1:53)  The classes he teaches, one very technical and the other an intro class
  • (2:23)  What the heck is Hadoop anyway?
  • (3:30)  What trends Ken has seen recently in the Hadoop world (the rise of  the fat node)

Extra-credit reading

Pau for now…


%d bloggers like this: