hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: MapReduce job with mixed data sources: HBase table and HDFS files
Date Wed, 10 Jul 2013 18:21:05 GMT
    conf.set(TableInputFormat.SCAN, convertScanToString(scan));

is called by initTableMapperJob().

Looking at the source would make it clear for you.

Cheers

On Wed, Jul 10, 2013 at 10:55 AM, S. Zhou <myxjtu@yahoo.com> wrote:

> Thanks Ted. I will try that. But at this time I am not sure how to call "
> conf.set()" after call "initTableMapperJob()"?
> The approach suggested by Azuryy is " conf.set(TableInputFormat.SCAN,
> TableMapReduceUtil.convertScanToString(new Scan()));"
>
>
>
> ________________________________
>  From: Ted Yu <yuzhihong@gmail.com>
> To: user@hbase.apache.org; S. Zhou <myxjtu@yahoo.com>
> Sent: Wednesday, July 10, 2013 10:21 AM
> Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS
> files
>
>
> Can you utilize initTableMapperJob() (which
> calls TableMapReduceUtil.convertScanToString() underneath) ?
>
> On Wed, Jul 10, 2013 at 10:15 AM, S. Zhou <myxjtu@yahoo.com> wrote:
>
> > Hi Azuryy, I am testing the way you suggested. Now I am facing a
> > compilation error for the following statement:
> > conf.set(TableInputFormat.SCAN,
> TableMapReduceUtil.convertScanToString(new
> > Scan()));
> >
> >
> > The error is: "method convertScanToString is not visible in
> > TableMapReduceUtil". Could u help? It blocks me.
> >
> >
> > BTW, I am using the HBase-server jar file version 0.95.1-hadoop1 . I
> tried
> > other versions as well like 0.94.9 and got the same error.
> >
> > Thanks!
> >
> >
> > ________________________________
> >  From: Azuryy Yu <azuryyyu@gmail.com>
> > To: user@hbase.apache.org
> > Sent: Wednesday, July 3, 2013 6:02 PM
> > Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS
> > files
> >
> >
> > Hi,
> > 1) It cannot input two different cluster's data to a MR job.
> > 2) If your data locates in the same cluster, then:
> >
> >     conf.set(TableInputFormat.SCAN,
> > TableMapReduceUtil.convertScanToString(new Scan()));
> >     conf.set(TableInputFormat.INPUT_TABLE, tableName);
> >
> >     MultipleInputs.addInputPath(conf, new Path(input_on_hdfs),
> > TextInputFormat.class, MapperForHdfs.class);
> >     MultipleInputs.addInputPath(conf, new Path(input_on_hbase),
> > TableInputFormat.class, MapperForHBase.class);*
> >
> > *
> > but,
> > new Path(input_on_hbase) can be any path, it make no sense.*
> >
> > *
> > Please refer to
> > org.apache.hadoop.hbase.mapreduce.IndexBuilder for how to read table in
> the
> > MR job under $HBASE_HOME/src/example*
> >
> >
> >
> > *
> >
> >
> > On Thu, Jul 4, 2013 at 5:19 AM, Michael Segel <michael_segel@hotmail.com
> > >wrote:
> >
> > > You may want to pull your data from your HBase first in a separate map
> > > only job and then use its output along with other HDFS input.
> > > There is a significant disparity between the reads from HDFS and from
> > > HBase.
> > >
> > >
> > > On Jul 3, 2013, at 10:34 AM, S. Zhou <myxjtu@yahoo.com> wrote:
> > >
> > > > Azuryy, I am looking at the MultipleInputs doc. But I could not
> figure
> > > out how to add HBase table as a Path to the input? Do you have some
> > sample
> > > code? Thanks!
> > > >
> > > >
> > > >
> > > >
> > > > ________________________________
> > > > From: Azuryy Yu <azuryyyu@gmail.com>
> > > > To: user@hbase.apache.org; S. Zhou <myxjtu@yahoo.com>
> > > > Sent: Tuesday, July 2, 2013 10:06 PM
> > > > Subject: Re: MapReduce job with mixed data sources: HBase table and
> > HDFS
> > > files
> > > >
> > > >
> > > > Hi ,
> > > >
> > > > Use MultipleInputs, which can solve your problem.
> > > >
> > > >
> > > > On Wed, Jul 3, 2013 at 12:34 PM, S. Zhou <myxjtu@yahoo.com> wrote:
> > > >
> > > >> Hi there,
> > > >>
> > > >> I know how to create MapReduce job with HBase data source only or
> HDFS
> > > >> file as data source. Now I need to create a MapReduce job with mixed
> > > data
> > > >> sources, that is, this MR job need to read data from both HBase and
> > HDFS
> > > >> files. Is it possible? If yes, could u share some sample code?
> > > >>
> > > >> Thanks!
> > > >> Senqiang
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message