hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igal Shilman <ig...@wix.com>
Subject Re: bulk loading problem
Date Tue, 28 Aug 2012 22:42:24 GMT
As suggested by the book, take a look at:
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles class,

This tool expects two arguments: (1) the path to the generated HFiles (in
your case it's outputPath) (2) the target table.
To use it programatically, you can either invoke it via the ToolRunner, or
calling LoadIncrementalHFiles.doBulkLoad() by yourself.
(after your M/R job has successfully finished)

If you are already loading to an existing table, then: (following your code)

LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
> int ret = loader.doBulkLoad(new Path(outputPath), new HTable(conf,
> tableName));


Otherwise,


> int ret = ToolRunner.run(new LoadIncrementalHFiles(conf),
>             new String[] {outputPath, tableName});



Good luck,
Igal.

On Tue, Aug 28, 2012 at 10:59 PM, Oleg Ruchovets <oruchovets@gmail.com>wrote:

> Hi Igal , thank you for the quick response  .
>    Can I execute this step programmatically?
>
> From link you sent :
>
> 9.8.5. Advanced Usage
>
> Although the importtsv tool is useful in many cases, advanced users may
> want to generate data programatically, or import data from other formats.
> To get started doing so, dig into ImportTsv.java and check the JavaDoc for
> HFileOutputFormat.
>
> The import step of the bulk load can also be done programatically. See the
> LoadIncrementalHFiles class for more information.
> The question is : what should I do/add to my job to write generated HFiles
> programmatically to Hbase?
>
>
>
>
> On Tue, Aug 28, 2012 at 8:08 PM, Igal Shilman <igals@wix.com> wrote:
>
> > Hi,
> > You need to complete the bulk load.
> > Check out http://hbase.apache.org/book/arch.bulk.load.html 9.8.2
> >
> > Igal.
> >
> > On Tue, Aug 28, 2012 at 7:29 PM, Oleg Ruchovets <oruchovets@gmail.com
> > >wrote:
> >
> > > Hi ,
> > >    I am on process to write my first bulk loading job. I use Cloudera
> > > CDH3U3 with hbase 0.90.4
> > >
> > > Executing a job I see HFiles   which created after job finished but
> there
> > > were  no entries in hbase. hbase shell >> count  'uu_bulk'  return 0.
> > >
> > > Here is my job configuration:
> > >
> > >         Configuration  conf =  HBaseConfiguration.create();
> > >
> > >        Job job = new Job(conf, getClass().getSimpleName());
> > >
> > >         job.setJarByClass(UuPushMapReduceJobFactory.class);
> > >         job.setMapperClass(UuPushMapper.class);
> > >         job.setMapOutputKeyClass(ImmutableBytesWritable.class);
> > >         job.setMapOutputValueClass(KeyValue.class);
> > >         job.setOutputFormatClass(HFileOutputFormat.class);
> > >
> > >
> > >
> > >         String path = uuAggregationContext.getUuInputPath();
> > >         String outputPath =
> > > "/bulk_loading_hbase/output/"+System.currentTimeMillis();
> > >         LOG.info("path = " + path);
> > >         LOG.info("outputPath = " + outputPath);
> > >
> > >         final String tableName = "uu_bulk";
> > >         LOG.info("hbase tableName: " + tableName);
> > >         createRegions(conf , Bytes.toBytes(tableName));
> > >
> > >         FileInputFormat.addInputPath(job, new Path(path));
> > >         FileOutputFormat.setOutputPath(job, new Path(outputPath));
> > >
> > >         HFileOutputFormat.configureIncrementalLoad(job, new
> HTable(conf,
> > > tableName));
> > >
> > >
> >
> //=====================================================================================
> > > Reducers log ends
> > >
> > > 2012-08-28 11:53:40,643 INFO org.apache.hadoop.mapred.Merger: Down to
> > > the last merge-pass, with 10 segments left of total size: 222885367
> > > bytes
> > > 2012-08-28 11:53:54,137 INFO
> > > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat:
> > >
> > >
> >
> Writer=hdfs://hdn16/bulk_loading_hbase/output/1346194117045/_temporary/_attempt_201208260949_0026_r_000005_0/d/3908303205246218823,
> > > wrote=268435455
> > > 2012-08-28 11:54:11,966 INFO org.apache.hadoop.mapred.Task:
> > > Task:attempt_201208260949_0026_r_000005_0 is done. And is in the
> > > process of commiting
> > > 2012-08-28 11:54:12,975 INFO org.apache.hadoop.mapred.Task: Task
> > > attempt_201208260949_0026_r_000005_0 is allowed to commit now
> > > 2012-08-28 11:54:13,007 INFO
> > > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved
> > > output of task 'attempt_201208260949_0026_r_000005_0' to
> > > /bulk_loading_hbase/output/1346194117045
> > > 2012-08-28 11:54:13,009 INFO org.apache.hadoop.mapred.Task: Task
> > > 'attempt_201208260949_0026_r_000005_0' done.
> > > 2012-08-28 11:54:13,010 INFO
> > > org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
> > > truncater with mapRetainSize=-1 and reduceRetainSize=-1
> > >
> > > As I understand HFiles were written
> > > to /bulk_loading_hbase/output/1346194117045 but I don't see any
> activity
> > > related to moving HFiles to hbase.
> > >
> > >
> > > What I am doing wrong? What should to get the result to be  written to
> > >  Hbase?
> > >
> > > Thanks in advance
> > > Oleg.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message