hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: MR on HDFS data inserted via HBase?
Date Thu, 14 Jan 2010 10:29:31 GMT
There is some work on a SerDe for Hive for HBase ongoing:

    https://issues.apache.org/jira/browse/HIVE-705

    https://issues.apache.org/jira/browse/HIVE-806

  - Andy


----- Original Message ----
> From: Amandeep Khurana <amansk@gmail.com>
> To: hbase-user@hadoop.apache.org
> Sent: Wed, January 13, 2010 8:36:15 PM
> Subject: Re: MR on HDFS data inserted via HBase?
> 
> Yes, by api I mean TableInputFormat and TableOutputFormat.
> 
> Pig has a connector to HBase. Not sure if Hive has one yet.
> 
> 
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
> 
> 
> On Wed, Jan 13, 2010 at 8:28 PM, Otis Gospodnetic <
> otis_gospodnetic@yahoo.com> wrote:
> 
> > Hello,
> >
> >
> > ----- Original Message ----
> >
> > > From: Amandeep Khurana 
> >
> > > HBase has its own file format. Reading data from it in your own job will
> > not
> > > be trivial to write, but not impossible.
> >
> > You are referring to HTable, HFile, etc.?
> >
> > > Why would you want to use the underlying data files in the MR jobs? Any
> > > limitation in using the HBase api?
> >
> > Are you referring to writing a MR job that makes use of TableInputFormat
> > and TableOutputFormat as mentioned on
> > 
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#sink?
> >
> > I think that would work.
> >
> > But I'd also like to be able to run Hive/Pig scripts over the data, and I
> > *think* neither support reading it from HBase.  But they can obviously read
> > it from files in HDFS, that's why I was asking.  But it sounds like anything
> > wanting to read HBase's data without going through the HBase's API and
> > reading from behind its back would have to know how to read from HFile &
> > friends?
> > (and again, I think/assume Hive and Pig don't know how to do that)
> >
> > Thanks,
> > Otis
> >
> > > On Wed, Jan 13, 2010 at 8:06 PM, Otis Gospodnetic <
> > > otis_gospodnetic@yahoo.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > If I import data into HBase, can I still run a hand-written MapReduce
> > job
> > > > over that data in HDFS?
> > > > That is, not using TableInputFormat to read the data back out via
> > HBase.
> > > >
> > > > Similarly, can one run Hive or Pig scripts against that data, but
> > again,
> > > > without Hive or Pig reading the data via HBase, but rather getting to
> > it
> > > > directly via HDFS?  I'm asking because I'm wondering whether storing
> > data in
> > > > HBase means I can no longer use Hive and Pig to run my ad-hoc jobs.
> > > >
> > > > Thanks,
> > > > Otis
> > > > --
> > > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> > > >
> > > >
> >
> >



      


Mime
View raw message