hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "S. Zhou" <myx...@yahoo.com>
Subject Re: MapReduce job with mixed data sources: HBase table and HDFS files
Date Wed, 10 Jul 2013 17:15:58 GMT
Hi Azuryy, I am testing the way you suggested. Now I am facing a compilation error for the
following statement:
conf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(new Scan()));


The error is: "method convertScanToString is not visible in TableMapReduceUtil". Could u help?
It blocks me. 


BTW, I am using the HBase-server jar file version 0.95.1-hadoop1 . I tried other versions
as well like 0.94.9 and got the same error.

Thanks!


________________________________
 From: Azuryy Yu <azuryyyu@gmail.com>
To: user@hbase.apache.org 
Sent: Wednesday, July 3, 2013 6:02 PM
Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS files
 

Hi,
1) It cannot input two different cluster's data to a MR job.
2) If your data locates in the same cluster, then:

    conf.set(TableInputFormat.SCAN,
TableMapReduceUtil.convertScanToString(new Scan()));
    conf.set(TableInputFormat.INPUT_TABLE, tableName);

    MultipleInputs.addInputPath(conf, new Path(input_on_hdfs),
TextInputFormat.class, MapperForHdfs.class);
    MultipleInputs.addInputPath(conf, new Path(input_on_hbase),
TableInputFormat.class, MapperForHBase.class);*

*
but,
new Path(input_on_hbase) can be any path, it make no sense.*

*
Please refer to
org.apache.hadoop.hbase.mapreduce.IndexBuilder for how to read table in the
MR job under $HBASE_HOME/src/example*



*


On Thu, Jul 4, 2013 at 5:19 AM, Michael Segel <michael_segel@hotmail.com>wrote:

> You may want to pull your data from your HBase first in a separate map
> only job and then use its output along with other HDFS input.
> There is a significant disparity between the reads from HDFS and from
> HBase.
>
>
> On Jul 3, 2013, at 10:34 AM, S. Zhou <myxjtu@yahoo.com> wrote:
>
> > Azuryy, I am looking at the MultipleInputs doc. But I could not figure
> out how to add HBase table as a Path to the input? Do you have some sample
> code? Thanks!
> >
> >
> >
> >
> > ________________________________
> > From: Azuryy Yu <azuryyyu@gmail.com>
> > To: user@hbase.apache.org; S. Zhou <myxjtu@yahoo.com>
> > Sent: Tuesday, July 2, 2013 10:06 PM
> > Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS
> files
> >
> >
> > Hi ,
> >
> > Use MultipleInputs, which can solve your problem.
> >
> >
> > On Wed, Jul 3, 2013 at 12:34 PM, S. Zhou <myxjtu@yahoo.com> wrote:
> >
> >> Hi there,
> >>
> >> I know how to create MapReduce job with HBase data source only or HDFS
> >> file as data source. Now I need to create a MapReduce job with mixed
> data
> >> sources, that is, this MR job need to read data from both HBase and HDFS
> >> files. Is it possible? If yes, could u share some sample code?
> >>
> >> Thanks!
> >> Senqiang
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message