hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HBase - bulk loading files
Date Fri, 09 Jan 2015 22:12:45 GMT
Salted buckets seem to be concept from other projects, such as Phoenix.

Can you be a bit more specific about your requirement ?

Cheers

On Fri, Jan 9, 2015 at 12:53 PM, Rama Ramani <rama.ramani@live.com> wrote:

> Is there a way to specify Salted buckets with HBase ImportTsv while doing
> bulk load?
>
> Thanks
> Rama
>
> From: rama.ramani@live.com
> To: user@hbase.apache.org
> Subject: RE: HBase - bulk loading files
> Date: Fri, 19 Dec 2014 14:09:09 -0800
>
>
>
>
> 0.98.0.2.1.9.0-2196-hadoop2Hadoop 2.4.0.2.1.9.0-2196Subversion
> git@github.com:hortonworks/hadoop-monarch.git -r cb50542bc92fb77dee52
> No, the clusters were not taking additional load.
> ThanksRama
> > Date: Fri, 19 Dec 2014 13:50:30 -0800
> > Subject: Re: HBase - bulk loading files
> > From: yuzhihong@gmail.com
> > To: user@hbase.apache.org
> >
> > Can you let us know the HBase and hadoop versions you're using ?
> >
> > Were the clusters taking load from other sources when ImportTsv was
> running
> > ?
> >
> > Cheers
> >
> > On Fri, Dec 19, 2014 at 1:43 PM, Rama Ramani <rama.ramani@live.com>
> wrote:
> >
> > > Hello,         I am bulk loading a set of files (about 400MB each) with
> > > "|" as the delimiter using ImportTsv. It takes a long time for the
> 'map'
> > > job to complete on both a 4 node and a 16 node cluster. I tried the
> option
> > > to generate the output (providing -Dimporttsv.bulk.output) which took
> time
> > > indicating that the generation of the output files needs improvement.
> > > I am seeing about 8000 rows / sec for this dataset, the 400MB ingestion
> > > takes about 5-6 mins. How can I improve this? Is there an alternate
> tool I
> > > can use?
> > > ThanksRama
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message