Web Glossary part two: Data tier

January 18, 2012

Here is part two of three of the Web glossary I complied.  As I mentioned in my last two entries, in compiling this I pulled information from various and sundry sources across the Web including wikipedia, community and company web sites and the brain of Cote.

Enjoy

General terms

  • Structured data: Data that can be organized in a structure e.g. rows or columns so that it is identifiable. The most universal form of structured data is a database like SQL or Access.
  • Unstructured data:  Data that has no identifiable structure. Unstructured data typically includes bitmap images/objects, text and other data types that are not part of a database. Most enterprise data today can actually be considered unstructured. An email is considered unstructured data.
  • Big Data: Data characterized by one or more of the following characteristics:  Volume – A large amount of data, growing at large rates; Velocity – The speed at which the data must be processed and a decision made;  Variety – The range of data, types and structure to the data
  • Relational Databases (RDBMS) Management Systems: These databases are the incumbents in enterprises today and store data in rows and columns.  They are created using a special computer language, structured query language (SQL), that is the standard for database interoperability.  Examples:  IBM DB2, MySQL, Microsoft SQL Server, PostgreSQL, Oracle RDBMS, Informix, Oracle Rdb, etc.
  • NoSQL: refers to a class of databases that 1) are intended to perform at internet (Facebook, Twitter, LinkedIn) scale and 2) reject the relational model in favor of other (key-value, document, graph) models.  They often achieve performance by having far fewer features than SQL databases and focus on a subset of use cases.  Examples: Cassandra, Hadoop, MongoDB, Riak
  • Recommendation engine:  A recommendation engine takes a collection of frequent itemsets as input and generates a recommendation set for a user by matching the current user’s activity against the discovered patterns. The recommendation engine is on-line process, therefore its efficiency and scalability are key,  e.g. people who bought X often also bought Y.
  • Geo-spatial targeting: the practice of mapping advertising, offers and information based on geo location.
  • Behavioral targeting: a technique used by online publishers and advertisers to increase the effectiveness of their campaigns.  Behavioral targeting uses information collected on an individual’s web-browsing behavior, such as the pages they have visited or the searches they have made, to select which advertisements to display to that individual.
  • Clickstream analysis: On a Web site, clickstream analysis is the process of collecting, analyzing, and reporting aggregate data about which pages visitors visit in what order – which are the result of the succession of mouse clicks each visitor makes (that is, the clickstream). There are two levels of clickstream analysis, traffic analysis and e-commerce analysis.

Projects/Entities

  • Gluster: a software company acquired by Red Hat that provides an open source platform for scale-out Public and Private Cloud Storage.
  • Relational Databases
    • MySQL:  the most popular open source RDBMS.  It represents the “M” in the LAMP stack.  It is now owned by Oracle.
    • Drizzle:  A version of MySQL that is specifically targeted the cloud.  It is currently an open source project without a commercial entity behind it.
    • Percona:  A MySQL support and consulting company that also supports Drizzle.
    • PostgreSQL: aka Postgres is is an object-relational database management system (ORDBMS) available for many platforms including Linux, FreeBSD, Solaris, Windows and Mac OS X.
    • Oracle DB – not used so much in new WebTech companies, but still a major database in the development world.
    • SQL Server – Microsoft’ s RDBMS

    NoSQL Databases

    • MongoDB:  an open source, high-performance, database written in C++.  Many Linux distros include a MongoDB package, including CentOS, Fedora, Debian, Ubuntu and Gentoo.  Prominent users include Disney interactive media group, New York Times, foursquare, bit.ly, Etsy. 10gen is the commercial backer of MongoDB.
    • Riak: a NoSQL database/datastore written in Erlang from the company Basho. Originally used for the Content Delivery Network Akamai.
    • Couchbase: formed from the merger of CouchOne and Membase.  It offers Couchbase server powered by Apache CouchDB and is available in both Enterprise and Community editions. The author of CouchDB was a prominent Lotus Notes architect.
    • Cassandra: A scalable NoSQL database with no single points of failure.   A high-scale, key/value database originating from Facebook to handle their message inboxes. Backed by DataStax, which came out of Rackspace.
    • Mahout: A Scalable machine learning and data mining library. An analytics engine for doing machine learning (e.g., recommendation engines and scenarios where you want to infer relationships).
  • Hadoop ecosystem
    • Hadoop: An open source platform, developed at Yahoo that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.  It is particularly suited to large volumes of unstructured data such as Facebook comments and Twitter tweets, email and instant messages, and security and application logs.
    • MapReduce: a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner.  Hadoop acts as a platform for executing MapReduce.  MapReduce came out of Google
    • HDFS: Hadoop’s Distributed File system allows large application workloads to be broken into smaller data blocks that are replicated and distributed across a cluster of commodity hardware for faster processing.
  • Major Hadoop utilities:
    • HBase: The Hadoop database that supports structured data storage for large tables.   It provides real time read/write access to your big data.
    • Hive:  A data warehousing solution built on top of Hadoop.  An Apache project
    • Pig: A platform for analyzing large data that leverages parallel computation.  An Apache project
    • ZooKeeper:  Allows Hadoop administrators to track and coordinate distributed applications.  An Apache project
    • Oozie: a workflow engine for Hadoop
    • Flume: a service designed to collect data and put it into your  Hadoop environment
    • Whirr: a set of libraries for running cloud services.  It’s ideal for running temporary Hadoop clusters to carry out a proof of concept, or to run a few one-time jobs.
    • Sqoop: a tool designed to transfer data between Hadoop and relational databases.  An Apache project
    • Hue: a browser-based desktop interface for interacting with Hadoop
  • Cloudera: a company that provides a Hadoop distribution similar to the way Red Hat provides a Linux distribution.  Dell is using Cloudera’s distribution of Hadoop for its Hadoop solution.
  • Solr: an open source enterprise search platform from the Apache Lucene project. Backed by the commercial company Lucid Imagination.
  • Elastic Search: an open source, distributed, search engine built on top of Lucene (raw search middleware).

Extra-credit reading

Pau for now…


Web Glossary part one: Application tier

January 17, 2012

As I mentioned in my last post, one of the ways we are helping our teams get a better understanding of the wild and wacky world of the Web and Web developers is via a glossary we’ve created.  In compiling this I pulled information from various and sundry sources across the Web including wikipedia, community and company web sites and the brain of Cote.

Over the next several entries I will be posting the glossary.  Feel free to bookmark it, delete it, offer corrections, comments or additions.

Today I present to you, the Application tier.

enjoy

General terms

  • Runtime: A programming language e.g. Java, .NET, JavaScript, PHP, Python, Ruby…
  • Application framework : Provides re-usable templates, methods, and ways of programming applications. Often, these frameworks will provide “widgets” and “libraries” that developers use to create various parts of their application – they may also include the actual tools to create, deploy, and run the final application. Some application frameworks create whole sub-cultures of developers, such as Rails which supports the Ruby programming language.  Most application frameworks are open source and free, though there are also many closed source, not-free ones.
  • Continuous code development lifecycle: releasing software at more frequent intervals (30 days or less) by (a.) doing smaller batches of code, and, (b.) using tools and processes that enable a more lean approach to development. Software released in such a cycle tends to release many small features instead of, in contrast, “traditional” development where 100s of features are bundled up in one version of the software and released every 1-2 years.

Programming languages

  • Java/.NET:  The incumbent enterprise development languages.  Very powerful but relatively difficult to learn and take time to program in.
  • Dynamic languages: e.g. PHP, Perl, Python, JavaScript, and Ruby.  They are popular for creating web applications since they are both simpler to learn and faster to code in than traditional enterprise standards like Java. This offers a substantial time to market advantage, particularly for smaller projects for which the benefits of Java are less applicable.
    • PHP: a server-side scripting language originally designed for web development to produce dynamic web pages.  WordPress is written in PHP, as well as Facebook and countless web sites. PHP is infamous for being very quick and easy to get started with (which it is) but turning into a mess of “spaghetti code” after years of work and different programmers.   PHP is open source, though Zend, the patron company behind PHP, and others sell “commercial” versions.
    • Perl:  One of the original programming languages of the web, Perl emphasizes a very “Unix way” of programming. Perl can be quick and elegant, but like PHP can result in a pile of hard to maintain code in the long term.  While Perl was extremely popular in the first Internet bubble, it has sense taken a back-seat to more popular development worlds such as PHP, Java, and Rails. Perl is open source and there are few, if any, commercial companies behind it.
    • Python: Like all dynamic languages, Python emphasizes speed of development and code readability. Its an object-oriented language. Python is something of an evolution of Perl, but it not that closely tied to it. Python emphases broadness of functionality while at the same time being a proper, object oriented programing language (not just a way to write “scripts”). Python enjoys steady popularity; Google uses Python as one of its primary programming languages.
    • JavaScript: once a minor language used in web browsers, JavaScript has become a stand-alone language on its own known and used by many programmers. Most web applications will include the use of JavaScript.
    • Ruby: Ruby and Python are very similar in ethos: emphasizing fast coding with a more human-readable syntax. Ruby became famous with the rise of Rails in the mid-2000s which was a rebellion against the “heavy weight” practices that Java imposed on web development.  Ruby is still very popular.  Ruby can also be run on-top of the Java virtual machine (via JRuby), providing a good bridge to the Java world.  Salesforce’s acquired PaaS, Heroku, uses Ruby, and most modern development platforms use Ruby.
    • Ruby on Rails: a popular web application framework written in Ruby.  Rails is frequently credited with making Ruby “famous”.
    • Scala:  A somewhat exotic language, but it has quite a buzz around it. It’s good for massive scale systems that need to be concurrent (lots of people changing lots of things, often the same things, at the same time).  Erlang is another language in this area.  Scala runs on the Java Virtual Machine and Common Language Runtime.  In April 2009 Twitter announced they had switched large portions of their backend from Ruby to Scala and intended to convert the rest.  In addition, Foursquare uses Scala and Lift (Lift is a framework for Scala much in the same way Rails is a framework for Ruby.)
  • R:  a programming language and software environment for statistical computing and graphics.
  • Node.js:  (aka “Node”) What’s interesting about Node.js is the idea that it is taking JavaScript which was originally designed to be used in web browsers and using it as a server-side environment.  It is intended for writing scalable network programs such as web servers.  It was created by Ryan Dahl in 2009, and its growth is sponsored by Joyent, which employs Dahl.
  • Clojure: A recent dialect of the Lisp programming language and is good for data intense applications.  It runs on the Java Virtual Machine and Common Language Runtime

Runtimes and Platforms

  • Common Language Runtime (CLR):  is the virtual machine component of Microsoft’s .NET framework and is responsible for managing the execution of .NET programs.
  • Java Virtual Machine (JVM) – the underlying execution engine that the Java language runs on-top of.  It controls access to the hardware, networks, and other “infrastructure” and services outside of the main application written in Java. Of special note is that many languages other than Java can run on the JVM (as with the CLR), e.g., Scala, Ruby, etc. There are many JVMs and ISVs (IBM, Oracle, etc.) will use their custom JVMs as key differentiators for middle ware, mostly around performance, scale-out, and security.

Projects/Entities

  • Openshift: Red Hat’s Platform as a Service (PaaS) offering.  More specifically, OpenShift is a PaaS software layer that Red Hat runs and manages on top of third party providers – Amazon first with more to follow.
  • Heroku:  A Platform as a Service (PaaS) offering that was acquired by Salesforce.com.  It supports development of Ruby on Rails, Java, PHP and Python.
  • CloudFoundry: A Platform as a Service (PaaS) offering and VMware-led project. Cloud Foundry provides a platform for building, deploying, and running cloud apps using the Spring Framework for Java developers, Rails and Sinatra for Ruby developers, Node.js and other JVM languages/frameworks including Groovy, Grails and Scala.
  • Joyent: Offers PaaS and IaaS capabilities through the public cloud.  Dell resells this capability as turnkey solution under the name The Dell Cloud Solution for Web applications.  Joyent also sponsors the development of node.js and employs its creator.
  • GitHub: a web-based hosting service for software development projects that use the Gitrevision control system. GitHub offers both commercial plans and free accounts for open source projects.

But wait there’s more…

Stay tuned for the next couple of entries when I will cover first the Database tier and then the Infrastructure tier.

Extra-credit reading

Pau for now…


The World of Web and Developers, getting to know it better

January 16, 2012

A couple years back, on the Public side of the house, Dell set up specific marketing teams  to focus on customer needs in three areas: Healthcare, Government and Education.  This vertical approach turned out to be a great way to get to better know our customers and their pain points and ultimately meet their needs.

Based on this success, a little while ago we kicked off a similar effort in our commercial business.  The first six verticals we are setting up are: Retail, Manufacturing, Financial Services, Web|Tech, Energy and TME (Telco, Media & Entertainment).  Web|Tech is the group I belong to (I lead marketing for the group).

Developers, Developers, Developers

In the Internet space we have already had a fair amount of success through our DCS group.  The idea with the new Web vertical is to learn even more about the customer set, companies that use the internet as their platform, and take this knowledge along with our accumulated experience, to a wider audience.  Two of the key areas of focus of this new vertical will be developers and open source software.

Look it up

One of the ways we are helping our teams get a better understand of the wild and wacky world of the Web and Web developers is via a glossary we’ve created.  In compiling this I pulled information from various and sundry sources across the Web including wikipedia, community and company web sites and the brain of Cote.

The glossary is organized into the following sections:

[Update Feb 1: I’ve gone back and linked the entries below]

Over the next several entries I will be posting the glossary.  Feel free to bookmark it, delete it, offer corrections, comments or additions.

Extra-credit reading

Pau for now…