Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DD71110D9A for ; Fri, 23 Aug 2013 19:01:38 +0000 (UTC) Received: (qmail 3779 invoked by uid 500); 23 Aug 2013 19:01:34 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 3724 invoked by uid 500); 23 Aug 2013 19:01:33 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 3712 invoked by uid 99); 23 Aug 2013 19:01:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Aug 2013 19:01:31 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gautam.borah@gmail.com designates 209.85.214.177 as permitted sender) Received: from [209.85.214.177] (HELO mail-ob0-f177.google.com) (209.85.214.177) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Aug 2013 19:01:26 +0000 Received: by mail-ob0-f177.google.com with SMTP id f8so1061869obp.36 for ; Fri, 23 Aug 2013 12:01:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=nw4Gc3V6Enm1FYFfLZC868DExJ0ZH0mjqmJnDqYuhsc=; b=0J9EFlRXK4M2M13xq9h/am3hhHG+gaOaz4roi7KJTImk9LtqYfUoo3xBaEhuGYmYaj RbfRwvTL0vjRYC1BEssfDAu+EfaFtL5pI5NHEcmR9zzE8hP70x1BietU9sToY4rKMyjo 0VGbS3U2YeB4RMRZcUxUU2B4+zFYspgnDfinv/WkHdFzWBXNYkQmoVHQJkGbnxHORT/N 49AqFhOPchghEtI+9+Kxm3MZNnqNEqtqfz3b5hzB3yQSn3EqU4sUtbZ/IRHaQFM5cc60 avI4AIDHLjMu+/eiBjzdsG7WfAeBNxF2Xamxkp3OUW5y5v8yQ3XnqkavGlv6YL9mcEEc YYRg== MIME-Version: 1.0 X-Received: by 10.60.62.4 with SMTP id u4mr1000577oer.35.1377284465910; Fri, 23 Aug 2013 12:01:05 -0700 (PDT) Received: by 10.76.97.226 with HTTP; Fri, 23 Aug 2013 12:01:05 -0700 (PDT) In-Reply-To: <1B331809-0487-403C-AAE1-7A635DECB230@gmail.com> References: <1B331809-0487-403C-AAE1-7A635DECB230@gmail.com> Date: Fri, 23 Aug 2013 12:01:05 -0700 Message-ID: Subject: Re: best approach for write and immediate read use case From: Gautam Borah To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=089e012953ba586ab604e4a20719 X-Virus-Checked: Checked by ClamAV on apache.org --089e012953ba586ab604e4a20719 Content-Type: text/plain; charset=ISO-8859-1 Hi, Average size of my records is 60 bytes - 20 bytes Key and 40 bytes value, table has one column family. I have setup a cluster for testing - 1 master and 3 region servers. Each have a heap size of 3 GB, single cpu. I have pre-split the table into 30 regions. I do not have to keep data forever, I could purge older records periodically. Thanks, Gautam On Fri, Aug 23, 2013 at 3:20 AM, Ted Yu wrote: > Can you tell us the average size of your records and how much heap is > given to the region servers ? > > Thanks > > On Aug 23, 2013, at 12:11 AM, Gautam Borah wrote: > > > Hello all, > > > > I have an use case where I need to write 1 million to 10 million records > > periodically (with intervals of 1 minutes to 10 minutes), into an HBase > > table. > > > > Once the insert is completed, these records are queried immediately from > > another program - multiple reads. > > > > So, this is one massive write followed by many reads. > > > > I have two approaches to insert these records into the HBase table - > > > > Use HTable or HTableMultiplexer to stream the data to HBase table. > > > > or > > > > Write the data to HDFS store as a sequence file (avro in my case) - run > map > > reduce job using HFileOutputFormat and then load the output files into > > HBase cluster. > > Something like, > > > > LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf); > > loader.doBulkLoad(new Path(outputDir), hTable); > > > > > > In my use case which approach would be better? > > > > If I use HTable interface, would the inserted data be in the HBase cache, > > before flushing to the files, for immediate read queries? > > > > If I use map reduce job to insert, would the data be loaded into the > HBase > > cache immediately? or only the output files would be copied to respective > > hbase table specific directories? > > > > So, which approach is better for write and then immediate multiple read > > operations? > > > > Thanks, > > Gautam > --089e012953ba586ab604e4a20719--