Aster’s Big Data Architecture

September 3, 2010

As I mentioned in my last entry, the week before last I headed out to the TDWI World Conference in San Diego.  Besides talking about Dell’s new BI practice, I was there to represent our data analytics partners, Aster Data and Greenplum.  Both vendors also had booths of their own and I was able to grab some time with Jeff Zeisler, director of pre-sales engineers at Aster Data, to get an overview of their architecture.  Here’s what Jeff had to say:

Some of the ground Jeff covers:

  • Aster is a MPP (massively-parallel processing) data warehouse solution.  It runs on a cluster of commodity hardware that execute SQL queries in parallel.
  • The 3 layers to the architecture:
    • Queen tier – central location users use to submit queries. It figures out how to split up the query and send it to the next tier.
    • Worker tier – where most of the servers are located, where data is stored (locally on the servers) and where all the heavy lifting for processing occurs.  The map reduce framework is built into this tier and sits right next to the SQL execution engine.
    • Loader and exporter tier:  a separate tier of machines that can be used to load new data into the system for  bulk loading.
  • How it works: Query gets broken up across all the machines, they each execute some portion of the query and the result are brought back together at the Queen and returned to the user.
  • New cool things coming up in the next 6 months.

Extra:

Pau for now…


Big Data in the Windy City

May 20, 2010

The Aqua building, catty corner from my hotel

Last Tuesday and Wednesday, I attended the TDWI (The Data Warehouse Institute) world conference in Chicago.  The show was a mix of courses and exhibit space.

I went to learn about the BI/Data warehousing segment and scout in preparation for the next conference in August.

Why BI?

My interest in the space comes from the fact that two of the three first partners in our Cloud Partner program are in the Data Warehousing and analytics space: Aster Data and Greenplum.  Both these partners are leveraging highly scaled-out architectures to crunch data.

While there, besides checking out the 24 companies on the exhibit floor, I attended three half-day classes: Developing your BI tool strategy, Cool BI, the latest innovations, Extending BI to support online marketing and Web 2.0.

For other newbies like myself, here are some notes from the first course.

My Notes: The layers of the BI Lifecycle stack

BI Suites:

  • What they do : Query, report, analyze, visualize, alert (front end to the chain)
  • The Big 4:  IBM (Cognos), SAP (Business Objects), Oracle (Hyperion), Microsoft
    • They all bought small players who excelled in the space
    • Usually offer the suites as part of a complete BI lifecycle stack
    • Two of the remaining independents are Microstrategy and SAS

Data Management

  • Data warehouse/mart databases and storage
  • Usually in a RDBMS but also in a dedicated OLAP database
  • Examples: Aster Data, Greenplum, Neteeza, Teradata

Data Integration (aka ETL)

  • They extract, transform and load info from the layer below into the layer above.
  • Examples: Informatica

Operational Apps/Systems

  • Planning, ERP, CRM etc
  • Orders, Invoices, Shipping, Web clicks

Extra-credit reading

Pau for now…