hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: MapReduce job with mixed data sources: HBase table and HDFS files
Date Wed, 10 Jul 2013 17:21:00 GMT
Can you utilize initTableMapperJob() (which
calls TableMapReduceUtil.convertScanToString() underneath) ?

On Wed, Jul 10, 2013 at 10:15 AM, S. Zhou <myxjtu@yahoo.com> wrote:

> Hi Azuryy, I am testing the way you suggested. Now I am facing a
> compilation error for the following statement:
> conf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(new
> Scan()));
>
>
> The error is: "method convertScanToString is not visible in
> TableMapReduceUtil". Could u help? It blocks me.
>
>
> BTW, I am using the HBase-server jar file version 0.95.1-hadoop1 . I tried
> other versions as well like 0.94.9 and got the same error.
>
> Thanks!
>
>
> ________________________________
>  From: Azuryy Yu <azuryyyu@gmail.com>
> To: user@hbase.apache.org
> Sent: Wednesday, July 3, 2013 6:02 PM
> Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS
> files
>
>
> Hi,
> 1) It cannot input two different cluster's data to a MR job.
> 2) If your data locates in the same cluster, then:
>
>     conf.set(TableInputFormat.SCAN,
> TableMapReduceUtil.convertScanToString(new Scan()));
>     conf.set(TableInputFormat.INPUT_TABLE, tableName);
>
>     MultipleInputs.addInputPath(conf, new Path(input_on_hdfs),
> TextInputFormat.class, MapperForHdfs.class);
>     MultipleInputs.addInputPath(conf, new Path(input_on_hbase),
> TableInputFormat.class, MapperForHBase.class);*
>
> *
> but,
> new Path(input_on_hbase) can be any path, it make no sense.*
>
> *
> Please refer to
> org.apache.hadoop.hbase.mapreduce.IndexBuilder for how to read table in the
> MR job under $HBASE_HOME/src/example*
>
>
>
> *
>
>
> On Thu, Jul 4, 2013 at 5:19 AM, Michael Segel <michael_segel@hotmail.com
> >wrote:
>
> > You may want to pull your data from your HBase first in a separate map
> > only job and then use its output along with other HDFS input.
> > There is a significant disparity between the reads from HDFS and from
> > HBase.
> >
> >
> > On Jul 3, 2013, at 10:34 AM, S. Zhou <myxjtu@yahoo.com> wrote:
> >
> > > Azuryy, I am looking at the MultipleInputs doc. But I could not figure
> > out how to add HBase table as a Path to the input? Do you have some
> sample
> > code? Thanks!
> > >
> > >
> > >
> > >
> > > ________________________________
> > > From: Azuryy Yu <azuryyyu@gmail.com>
> > > To: user@hbase.apache.org; S. Zhou <myxjtu@yahoo.com>
> > > Sent: Tuesday, July 2, 2013 10:06 PM
> > > Subject: Re: MapReduce job with mixed data sources: HBase table and
> HDFS
> > files
> > >
> > >
> > > Hi ,
> > >
> > > Use MultipleInputs, which can solve your problem.
> > >
> > >
> > > On Wed, Jul 3, 2013 at 12:34 PM, S. Zhou <myxjtu@yahoo.com> wrote:
> > >
> > >> Hi there,
> > >>
> > >> I know how to create MapReduce job with HBase data source only or HDFS
> > >> file as data source. Now I need to create a MapReduce job with mixed
> > data
> > >> sources, that is, this MR job need to read data from both HBase and
> HDFS
> > >> files. Is it possible? If yes, could u share some sample code?
> > >>
> > >> Thanks!
> > >> Senqiang
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message