hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hammerbacher" <jeff.hammerbac...@gmail.com>
Subject Re: HBase, Hive, Pig and other Hadoop based technologies
Date Wed, 03 Sep 2008 14:41:38 GMT
Hey Naama,

There's quite a bit going on here, but I'll try to get the ball
rolling on an explanation of similarities and differences:

1) Language for data retrieval
Both Pig and Hive implement languages for data retrieval. Pig is aimed
at "experienced programmers for performing ad-hoc analysis of
extremely large data sets", and often these data sets are "temporary".
These design points dictate that Pig be procedural, though they have
chosen a somewhat SQL-like syntax. Hive, on the other hand, is aimed
more at data analysts rather than engineers, and thus uses a
declarative language with a syntax hews a bit closer to SQL. HBase
offers a simpler API for getting and putting individual rows of data,
thought I believe someone has written a (possibly unsupported)
SQL-like retrieval language (HQL?) above HBase.

2) Schema management
Pig requires you to specify the structure of your data with each
query, while Hive and HBase provide separate processes which manage
the schemas of your data.

3) Managed storage
Pig is agnostic to how you lay your data out within HDFS. Hive can
also work with unmanaged data in HDFS, but if you let Hive manage your
data, it can do a little bit of optimization for retrieval by
partitioning your data inside of the file system. HBase manages data
layout for you in the file system.

Additionally, I'd say looking at the design points of each system
might be of help:

-Pig was designed for experienced programmers performing ad-hoc data analysis
-Hive was designed for business analysts and programmers, for use in a
data warehousing environment
-HBase was designed to enable point lookup in addition to MapReduce,
and can possibly be used in OLTP-type applications where availability
is not a concern (though the HBase folks have told me they intend it
to be used primarily for OLAP-style workloads)


On Wed, Sep 3, 2008 at 5:04 AM, Naama Kraus <naamakraus@gmail.com> wrote:
> Hi,
> There are various technologies on top of Hadoop such as HBase, Hive, Pig and
> more. I was wondering what are the differences between them. What are the
> usage scenarios that fit each one of them.
> For instance, is it true to say that Pig and Hive belong to the same family
> ? Or is Hive more close to HBase ?
> My understanding is that HBase allows direct lookup and low latency queries,
> while Pig and Hive provide batch processing operations which are M/R based.
> Both define a data model and an SQL-like query language. Is this true ?
> Could anyone shed light on when to use each technology ? Main differences ?
> Pros and Cons ?
> Information on other technologies such as Jaql is also welcome.
> Thanks, Naama
> --
> oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
> 00 oo 00 oo
> "If you want your children to be intelligent, read them fairy tales. If you
> want them to be more intelligent, read them more fairy tales." (Albert
> Einstein)

View raw message