Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of gautam.borah@gmail.com
 designates 209.85.214.177 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <1B331809-0487-403C-AAE1-7A635DECB230@gmail.com>
References: 
 <CAFHL1WbW9gdoJJSGMh3LU-zG9xqvmZeJ4c4CR2_60sC4Q8Ng1w@mail.gmail.com>
	<1B331809-0487-403C-AAE1-7A635DECB230@gmail.com>
Date: Fri, 23 Aug 2013 12:01:05 -0700
Message-ID: 
 <CAFHL1WZpmXyjcziTp0f8Bz0Sa078=Nad=CGhGDgKzCqyEV2uag@mail.gmail.com>
Subject: Re: best approach for write and immediate read use case
From: Gautam Borah <gautam.borah@gmail.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=089e012953ba586ab604e4a20719

--089e012953ba586ab604e4a20719
Content-Type: text/plain; charset=ISO-8859-1

Hi,

Average size of my records is 60 bytes - 20 bytes Key and 40 bytes value,
table has one column family.

I have setup a cluster for testing - 1 master and 3 region servers. Each
have a heap size of 3 GB, single cpu.

I have pre-split the table into 30 regions. I do not have to keep data
forever, I could purge older records periodically.

Thanks,

Gautam


On Fri, Aug 23, 2013 at 3:20 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> Can you tell us the average size of your records and how much heap is
> given to the region servers ?
>
> Thanks
>
> On Aug 23, 2013, at 12:11 AM, Gautam Borah <gautam.borah@gmail.com> wrote:
>
> > Hello all,
> >
> > I have an use case where I need to write 1 million to 10 million records
> > periodically (with intervals of 1 minutes to 10 minutes), into an HBase
> > table.
> >
> > Once the insert is completed, these records are queried immediately from
> > another program - multiple reads.
> >
> > So, this is one massive write followed by many reads.
> >
> > I have two approaches to insert these records into the HBase table -
> >
> > Use HTable or HTableMultiplexer to stream the data to HBase table.
> >
> > or
> >
> > Write the data to HDFS store as a sequence file (avro in my case) - run
> map
> > reduce job using HFileOutputFormat and then load the output files into
> > HBase cluster.
> > Something like,
> >
> >  LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
> >  loader.doBulkLoad(new Path(outputDir), hTable);
> >
> >
> > In my use case which approach would be better?
> >
> > If I use HTable interface, would the inserted data be in the HBase cache,
> > before flushing to the files, for immediate read queries?
> >
> > If I use map reduce job to insert, would the data be loaded into the
> HBase
> > cache immediately? or only the output files would be copied to respective
> > hbase table specific directories?
> >
> > So, which approach is better for write and then immediate multiple read
> > operations?
> >
> > Thanks,
> > Gautam
>

--089e012953ba586ab604e4a20719--