hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: HBase as input AND output?
Date Wed, 13 Oct 2010 21:24:12 GMT
Thanks Tim.
(and sorry for the duplicate email - need to fix my Hive email filter)


Just to clarify one bit, though.
When using Hive without HBase one has data stored in the appropriate directories 
on HDFS and runs MR jobs against those data.

But, when using Hive *with* HBase, does Hive require any such data to be present 
in the HDFS?
In other words, when using Hive with HBase, one really uses only Hive's ability 
to translate a Hive QL statement to a set of MR jobs (and read from/write to 
HBase) and execute them against only data stored in HBase.  Is this correct?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Tim Robertson <timrobertson100@gmail.com>
> To: user@hive.apache.org
> Sent: Wed, October 13, 2010 4:45:31 PM
> Subject: Re: HBase as input AND output?
> 
> That's right.  Hive can use an HBase table as an input format to  the
> hive query regardless of output format, and can also write the  output
> to an HBase table regardless of the input format.  You can  also
> supposedly do a join in Hive that uses 1 side of the join from  an
> HBase table, and the other side a text file, which is very powerful.
> I  haven't done it myself, but intend to shortly.
> 
> HTH,
> Tim
> 
> 
> On  Wed, Oct 13, 2010 at 10:07 PM, Otis Gospodnetic
> <otis_gospodnetic@yahoo.com>  wrote:
> > Hi,
> >
> > I was wondering how I can query data stored  in HBase and remembered Hive's 
>HBase
> > integration:
> > http://wiki.apache.org/hadoop/Hive/HBaseIntegration
> >
> > After  watching John Sichi's video
> > 
>(http://developer.yahoo.com/blogs/hadoop/posts/2010/04/hundreds_of_hadoop_fans_at_the/
>
> >   ) I have a better idea about what functionality this integration provides,  
>but
> > I still have some questions.
> >
> > Would it be correct to  say that Hive-HBase integration makes the following 
>data
> > flow  possible:
> >
> > 0) Hive or Files => Custom HQL statement that  aggregates data  ==> HBase
> > 1) HBase ==> Custom HQL statement that  aggregates data  ==> HBase
> > 2) HBase ==> Custom HQL statement that  aggregates data  ==> output 
>(console?)
> >
> > Of the above, 1) is  what I'm wondering the most about right now.
> >
> > In other words, it  seems to me that Hive may be able to look at *just* data
> > stored in HBase  *without* the typical data/files in HDFS that Hive normally 
>runs
> > its MR  jobs against.
> >
> > Is this correct?
> >
> > Thanks,
> >  Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Hadoop  ecosystem search :: http://search-hadoop.com/
> >
> >
> 

Mime
View raw message