Hadoop World: Karmasphere and big data intelligence

November 14, 2011

One thing Hadoop isn’t great at right out of the box is data analytics, that’s where a company like Karmasphere comes in.  Karmasphere provides business intelligence software that data analysts can use to use to mine the data that Hadoop sucks up.

Last week at Hadoop World I grabbed some time with Karamsphere’s Chairman and co-founder, Martin Hall to learn more about where he and his company play in the wild world of big data.

Some of the ground Martin covers

  • Where does Karmasphere play in the big data stack, how is it used and by whom
  • (0:38) Where did the idea for developing Karmasphere come from
  • (1:58) What is the Karmasphere “secret sauce”
  • (2:18) What are the main industries and use cases where their offerings are used
  • (3:40) What can we look forward to in future releases

But wait, there’s more!

Stay tuned for more interviews from last week’s Hadoop world.  On tap are: Mark Mims of Canonical, Todd Papaioannou from Battery Ventures, John Gray of Facebook, Erik Swan of Splunk and Nosh Petigara of 10gen/MongoDB.

Extra-credit reading

Pau for now..

Does Hadoop compete with or complement the data warehouse?

August 12, 2011

Dell’s chief architect for big data, Aurelian Dumitru (aka. A.D.) presented a talk at OSCON the week before last with the heady title, “Hadoop – Enterprise Data Warehouse Data Flow Analysis and Optimization.”  The session, which was well attended, explored the integration between Hadoop and the Enterprise Data Warehouse.  AD posted a fairly detailed overview of his session on his blog but if you want a great high level summary, check this out:

Some of the ground AD covers

  • Mapping out the data life cycle: Generate -> Capture -> Store -> Analyze ->Present
  • Where does Hadoop play and where does the data warehouse?  Where do they overlap?
  • Where do BI tools fit into the equation?
  • To learn more, check out dell.com/hadoop

Extra-credit reading

Hadoop Summit: Talking to the CEO of MapR

July 10, 2011

I’m now back from vacation and am continuing with my series of videos from the Hadoop Summit.  The one-day summit, which was very well attended, was held in Santa Clara the last week of June.  One of the two Platinum sponsors was MapR technologies.  MapR are particulaly interesting since they have taken a different approach to productizing Hadoop than the current leader Cloudera.

I got some time with their CEO and co-founder John Schroeder to learn more about MapR:

Some of the ground John covers

  • The announcements they made at the event
  • (0:16) How John got the idea to start MapR: what tech trends he was seeing and what customer problems was he learning about.
  • (1:43) How MapR’s approach to Hadoop differs from Cloudera (and Hortonworks)
  • (3:49) How the Hadoop community is growing, both with regards to Apache and the commercial entities that are developing, and the importance of this growth.

Extra-credit reading

Pau for now…

DCS systems, solutions and MDC steal show at Dell sales kick-off

February 1, 2011

Every year at the end of January Dell holds a giant kick-off meeting for our enterprise and public sales forces.  The event, which has been held in Las Vegas the last two years, is a four-day happening consisting of keynotes, sessions and a full-scale expo where the sales team can touch and learn first-hand about the latest and greatest in Dell solutions and offerings.

Setting up the DCS Modular Data Center on the expo floor

At last year’s sales kick-off, the Data Center Solutions (DCS) group had our big coming out party, letting the sales force know that we would be expanding beyond our elite custom system business, with a specialized PowerEdge C line and a set of cloud solutions.

This year the systems and solutions have been out in the market for a little while and we were able to share actual case studies with the attendees showing how our systems and solutions have been able to solve real customer problems.  The big new addition to the DCS line up was our Modular Data Center (MDC) which, until just a few months ago, was reserved only for a very small group of select customers.

Gearing up for day two of both duty at the DCS booth.

As you can tell from the picture above, the MDC took up a big part of our booth.  It served to house our PowerEdgeC servers and host a selection of our cloud solutions:

Additionally, to provide a peak at what PowerEdge C systems we have up our sleeve, we had several units in an uber secret whisper suite.

Our overall message at the booth was that although these components can be used individually, if you want to run “the world’s most efficient hyperscale data center” you’ll want to combine these optimized solutions and systems with the MDC into one hyper-efficient, integrated system.

Well received

Now as a member of the DCS team I may be a little biased but I really think we had the coolest booth there 🙂  It was great to hear comments from the sales force such as “this is awesome!” and “why didn’t I know about this?!”

We’ll have to start now to figure how we will top this next year.

Extra-credit reading

Pau for now…

Dell Cloud Solutions up and running!

November 19, 2010

Back in March we announced Dell’s cloud solutions.  Today at a press conference in San Francisco we announced their general availability along with some examples of customers who are employing them. (Woohoo!)

What’s the big idea

The idea behind these offerings has been to leverage the experience we in the DCS group have gained over the last several years providing custom systems to some of the world’s largest cloud providers.  These new solutions are targeted at organizations the next tier down (the “next 1,000”) from the hyperscale customers we have been working with.

Who’s using these solutions, a couple of examples

  • Uniserve, a Canadian Internet services provider, has adopted the Dell Cloud Solution for Web Applications to offer on-demand access to a high-performance Internet application and consumer delivery platform, for customers to develop iPhone apps to commercial storefronts, to hosting and delivering Software-as-a-Service.
  • InsightExpress, a leading provider of digital marketing research solutions, has deployed the Dell Solution for Data Analytics. The solution combines analytic platform software from Aster Data with Dell PowerEdge C servers with joint service and support, enabling InsightExpress to measure the effectiveness of advertising and brand communications for clients to drive high performing marketing campaigns.

How we got here

We started our expansion by creating a line of specialized PowerEdge C servers patterned after the custom systems we have been designing for the “biggest of the big.”  What we realized though is that, unlike the biggest players who write their own software, the next 1000 don’t just want servers, they want solutions that also include software and services as well.

The three integrated solutions that are available today are:

  • Dell Cloud Solution for Web Applications: A turnkey platform-as-a-service offering targeted at IT service providers, hosting companies and telcos.  This private cloud offering combines Dell’s specialized cloud servers with fully integrated software from Joyent.
  • Dell Cloud Solution for Data Analytics: A combination of Dell’s PowerEdge C servers with Aster Data’s nCluster, a massively parallel processing database with an integrated analytics engine.
  • Dell Cloud Solution for Data Warehousing: PowerEdge C servers and Greenplum Database 4.0 for building enterprise data warehouses and consolidating data marts in massively parallel processing environments.

Stay tuned for more news and more solutions!

Pau for now…

Extra-credit reading:

Dell’s hyper scale cloud efforts — Everything you wanted to know in 3 minutes

October 14, 2010

Last week a couple of us went down to San Antonio to help represent the OpenStack project at Rackspace’s partner summit.  While there I met up with the VAR Guy.   Mr. Guy got me chatting about Dell’s Data Center Solutions group, where we’ve been and where we’re going.  Below is the resulting video he put together featuring myself and San Antonio’s greenery. (See the original article this came from).

Some of topics I tackle:

  • How Dell’s Data Center Solutions Group is designing servers for high-end cloud computing
  • How Dell is integrating hardware with software in cloud servers
  • Coming soon: Dell Cloud Solution for Web Applications/Leveraging Joyent‘s software
  • Dell’s cloud partner program – where Ubuntu Enterprise Cloud, Aster Data and Greenplum fit in.
  • Dell’s commitment to OpenStack

Extra-credit reading:

Pau for now…

Talking to Aster Data’s brand new CEO

September 30, 2010

I flew to Chicago today to support our partner Aster Data’s Big Data Insight summit.  This Chicago event is a part of series of roadshows that Aster is doing for customers in cities in the US and Europe.  Today’s event was held in the trendy Hotel Sax and featured talks from analysts as well as partners (SAS, Microstrategy and Dell).

Attending his first Aster roadshow was their brand new CEO, Quentin Gallivan.  As the post-event happy hour was winding down I grabbed a few minutes with Quentin.  Here is what he had to say:

Some of the topics Quentin covers:

  • What he did before Aster: CEO of BI SaaS provider Pivot link; CEO of Postini a SaaS email security company, and key exec at Verisign.
  • Why Quentin decided to join Aster.
  • How he heard about the opportunity.
  • What he see’s as Aster’s opportunity.
  • How the Dell partnership allows Aster to deliver a total solution.

Extra-credit reading:

Pau for now…

Aster’s Big Data Architecture

September 3, 2010

As I mentioned in my last entry, the week before last I headed out to the TDWI World Conference in San Diego.  Besides talking about Dell’s new BI practice, I was there to represent our data analytics partners, Aster Data and Greenplum.  Both vendors also had booths of their own and I was able to grab some time with Jeff Zeisler, director of pre-sales engineers at Aster Data, to get an overview of their architecture.  Here’s what Jeff had to say:

Some of the ground Jeff covers:

  • Aster is a MPP (massively-parallel processing) data warehouse solution.  It runs on a cluster of commodity hardware that execute SQL queries in parallel.
  • The 3 layers to the architecture:
    • Queen tier – central location users use to submit queries. It figures out how to split up the query and send it to the next tier.
    • Worker tier – where most of the servers are located, where data is stored (locally on the servers) and where all the heavy lifting for processing occurs.  The map reduce framework is built into this tier and sits right next to the SQL execution engine.
    • Loader and exporter tier:  a separate tier of machines that can be used to load new data into the system for  bulk loading.
  • How it works: Query gets broken up across all the machines, they each execute some portion of the query and the result are brought back together at the Queen and returned to the user.
  • New cool things coming up in the next 6 months.


Pau for now…

Ubuntu, the Cloud and the Future — Neil Levine

July 27, 2010

After the cloud summit last week at OSCON, I sat down with Neil Levine of Canonical to see what was in store for Ubuntu cloud-wise (Canonical is a partner of ours in our cloud ISV program).  Neil is the VP of Canonical’s corporate services division which handles their cloud and server products.

Here’s what Neil had to say:

Some of the topics Neil tackles:

  • The next Ubuntu release “Maverick Meerkat” and its geek-a-licious launch date: 10.10.10.
  • Look for Maverick to make Eucalyptus even easier to deploy and use.
  • Data processing and data analytics is one of the key use cases in the cloud and Canonical is looking to move up the stack and provide deep integration for other apps like Hadoop and NoSQL.
  • What are some of the areas of focus for next year’s two releases i.e. 11.04 and 11.10.
  • Project ensemble: what it is and what its goals are.

Extra-credit reading

Pau for now…

Big Data in the Windy City

May 20, 2010

The Aqua building, catty corner from my hotel

Last Tuesday and Wednesday, I attended the TDWI (The Data Warehouse Institute) world conference in Chicago.  The show was a mix of courses and exhibit space.

I went to learn about the BI/Data warehousing segment and scout in preparation for the next conference in August.

Why BI?

My interest in the space comes from the fact that two of the three first partners in our Cloud Partner program are in the Data Warehousing and analytics space: Aster Data and Greenplum.  Both these partners are leveraging highly scaled-out architectures to crunch data.

While there, besides checking out the 24 companies on the exhibit floor, I attended three half-day classes: Developing your BI tool strategy, Cool BI, the latest innovations, Extending BI to support online marketing and Web 2.0.

For other newbies like myself, here are some notes from the first course.

My Notes: The layers of the BI Lifecycle stack

BI Suites:

  • What they do : Query, report, analyze, visualize, alert (front end to the chain)
  • The Big 4:  IBM (Cognos), SAP (Business Objects), Oracle (Hyperion), Microsoft
    • They all bought small players who excelled in the space
    • Usually offer the suites as part of a complete BI lifecycle stack
    • Two of the remaining independents are Microstrategy and SAS

Data Management

  • Data warehouse/mart databases and storage
  • Usually in a RDBMS but also in a dedicated OLAP database
  • Examples: Aster Data, Greenplum, Neteeza, Teradata

Data Integration (aka ETL)

  • They extract, transform and load info from the layer below into the layer above.
  • Examples: Informatica

Operational Apps/Systems

  • Planning, ERP, CRM etc
  • Orders, Invoices, Shipping, Web clicks

Extra-credit reading

Pau for now…

%d bloggers like this: