hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: MR on HDFS data inserted via HBase?
Date Thu, 14 Jan 2010 04:36:15 GMT
Yes, by api I mean TableInputFormat and TableOutputFormat.

Pig has a connector to HBase. Not sure if Hive has one yet.


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Wed, Jan 13, 2010 at 8:28 PM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Hello,
>
>
> ----- Original Message ----
>
> > From: Amandeep Khurana <amansk@gmail.com>
>
> > HBase has its own file format. Reading data from it in your own job will
> not
> > be trivial to write, but not impossible.
>
> You are referring to HTable, HFile, etc.?
>
> > Why would you want to use the underlying data files in the MR jobs? Any
> > limitation in using the HBase api?
>
> Are you referring to writing a MR job that makes use of TableInputFormat
> and TableOutputFormat as mentioned on
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#sink?
>
> I think that would work.
>
> But I'd also like to be able to run Hive/Pig scripts over the data, and I
> *think* neither support reading it from HBase.  But they can obviously read
> it from files in HDFS, that's why I was asking.  But it sounds like anything
> wanting to read HBase's data without going through the HBase's API and
> reading from behind its back would have to know how to read from HFile &
> friends?
> (and again, I think/assume Hive and Pig don't know how to do that)
>
> Thanks,
> Otis
>
> > On Wed, Jan 13, 2010 at 8:06 PM, Otis Gospodnetic <
> > otis_gospodnetic@yahoo.com> wrote:
> >
> > > Hello,
> > >
> > > If I import data into HBase, can I still run a hand-written MapReduce
> job
> > > over that data in HDFS?
> > > That is, not using TableInputFormat to read the data back out via
> HBase.
> > >
> > > Similarly, can one run Hive or Pig scripts against that data, but
> again,
> > > without Hive or Pig reading the data via HBase, but rather getting to
> it
> > > directly via HDFS?  I'm asking because I'm wondering whether storing
> data in
> > > HBase means I can no longer use Hive and Pig to run my ad-hoc jobs.
> > >
> > > Thanks,
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> > >
> > >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message