hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naama Kraus" <naamakr...@gmail.com>
Subject Re: HBase, Hive, Pig and other Hadoop based technologies
Date Thu, 04 Sep 2008 09:45:36 GMT
Thanks Jeff for the informative answer.

Naama

On Wed, Sep 3, 2008 at 5:41 PM, Jeff Hammerbacher <
jeff.hammerbacher@gmail.com> wrote:

> Hey Naama,
>
> There's quite a bit going on here, but I'll try to get the ball
> rolling on an explanation of similarities and differences:
>
> 1) Language for data retrieval
> Both Pig and Hive implement languages for data retrieval. Pig is aimed
> at "experienced programmers for performing ad-hoc analysis of
> extremely large data sets", and often these data sets are "temporary".
> These design points dictate that Pig be procedural, though they have
> chosen a somewhat SQL-like syntax. Hive, on the other hand, is aimed
> more at data analysts rather than engineers, and thus uses a
> declarative language with a syntax hews a bit closer to SQL. HBase
> offers a simpler API for getting and putting individual rows of data,
> thought I believe someone has written a (possibly unsupported)
> SQL-like retrieval language (HQL?) above HBase.
>
> 2) Schema management
> Pig requires you to specify the structure of your data with each
> query, while Hive and HBase provide separate processes which manage
> the schemas of your data.
>
> 3) Managed storage
> Pig is agnostic to how you lay your data out within HDFS. Hive can
> also work with unmanaged data in HDFS, but if you let Hive manage your
> data, it can do a little bit of optimization for retrieval by
> partitioning your data inside of the file system. HBase manages data
> layout for you in the file system.
>
> Additionally, I'd say looking at the design points of each system
> might be of help:
>
> -Pig was designed for experienced programmers performing ad-hoc data
> analysis
> -Hive was designed for business analysts and programmers, for use in a
> data warehousing environment
> -HBase was designed to enable point lookup in addition to MapReduce,
> and can possibly be used in OLTP-type applications where availability
> is not a concern (though the HBase folks have told me they intend it
> to be used primarily for OLAP-style workloads)
>
> Regards,
> Jeff
>
> On Wed, Sep 3, 2008 at 5:04 AM, Naama Kraus <naamakraus@gmail.com> wrote:
> > Hi,
> >
> > There are various technologies on top of Hadoop such as HBase, Hive, Pig
> and
> > more. I was wondering what are the differences between them. What are the
> > usage scenarios that fit each one of them.
> >
> > For instance, is it true to say that Pig and Hive belong to the same
> family
> > ? Or is Hive more close to HBase ?
> > My understanding is that HBase allows direct lookup and low latency
> queries,
> > while Pig and Hive provide batch processing operations which are M/R
> based.
> > Both define a data model and an SQL-like query language. Is this true ?
> >
> > Could anyone shed light on when to use each technology ? Main differences
> ?
> > Pros and Cons ?
> > Information on other technologies such as Jaql is also welcome.
> >
> > Thanks, Naama
> >
> > --
> > oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00
> oo
> > 00 oo 00 oo
> > "If you want your children to be intelligent, read them fairy tales. If
> you
> > want them to be more intelligent, read them more fairy tales." (Albert
> > Einstein)
> >
>



-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message