Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 75FDF10AA9 for ; Thu, 2 Jan 2014 18:32:57 +0000 (UTC) Received: (qmail 13877 invoked by uid 500); 2 Jan 2014 18:31:27 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 13741 invoked by uid 500); 2 Jan 2014 18:31:24 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 13541 invoked by uid 99); 2 Jan 2014 18:31:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jan 2014 18:31:17 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vrodionov@carrieriq.com designates 204.235.122.16 as permitted sender) Received: from [204.235.122.16] (HELO obmail.carrieriq.com) (204.235.122.16) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jan 2014 18:31:13 +0000 From: Vladimir Rodionov To: "user@hbase.apache.org" Date: Thu, 2 Jan 2014 10:30:51 -0800 Subject: RE: Performance between HBaseClient scan and HFileReaderV2 Thread-Topic: Performance between HBaseClient scan and HFileReaderV2 Thread-Index: Ac8H01w6p+gdfgDTRMGqLjZ8y6Q5UQAFH9XZ Message-ID: References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US x-kse-antivirus-interceptor-info: scan successful x-kse-antivirus-info: Clean Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org HBase scanner MUST guarantee correct order of KeyValues (coming from differ= ent HFile's), filter condition+ filter condition on included column families and qualifie= rs, time range, max versions and correctly process deleted cells. Direct HFileReader does nothing from the above list. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodionov@carrieriq.com ________________________________________ From: Jerry Lam [chilinglam@gmail.com] Sent: Thursday, January 02, 2014 7:56 AM To: user Subject: Re: Performance between HBaseClient scan and HFileReaderV2 Hi Tom, Good point. Note that I also ran the HBaseClient performance test several times (as you can see from the chart). The caching should also benefit the second time I ran the HBaseClient performance test not just benefitting the HFileReaderV2 test. I still don't understand what makes the HBaseClient performs so poorly in comparison to access directly HDFS. I can understand maybe a factor of 2 (even that it is too much) but a factor of 8 is quite unreasonable. Any hint? Jerry On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood wrote: > I'm also new to HBase and am not familiar with HFileReaderV2. However, i= n > your description, you didn't mention anything about clearing the linux OS > cache between tests. That might be why you're seeing the big difference = if > you ran the HBaseClient test first, it may have warmed the OS cache and > then HFileReaderV2 benefited from it. Just a guess... > > -- Tom > > > > On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam wrote: > > > Hello HBase users, > > > > I just ran a very simple performance test and would like to see if what= I > > experienced make sense. > > > > The experiment is as follows: > > - I filled a hbase region with 700MB data (each row has roughly 45 > columns > > and the size is 20KB for the entire row) > > - I configured the region to hold 4GB (therefore no split occurs) > > - I ran compactions after the data is loaded and make sure that there i= s > > only 1 region in the table under test. > > - No other table exists in the hbase cluster because this is a DEV > > environment > > - I'm using HBase 0.92.1 > > > > The test is very basic. I use HBaseClient to scan the entire region to > > retrieve all rows and all columns in the table, just iterating all > KeyValue > > pairs until it is done. It took about 1 minute 22 sec to complete. (Not= e > > that I disable block cache and uses caching size about 10000). > > > > I ran another test using HFileReaderV2 and scan the entire region to > > retrieve all rows and all columns, just iterating all keyValue pairs > until > > it is done. It took 11 sec. > > > > The performance difference is dramatic (almost 8 times faster using > > HFileReaderV2). > > > > I want to know why the difference is so big or I didn't configure HBase > > properly. From this experiment, HDFS can deliver the data efficiently s= o > it > > is not the bottleneck. > > > > Any help is appreciated! > > > > Jerry > > > > > Confidentiality Notice: The information contained in this message, includi= ng any attachments hereto, may be confidential and is intended to be read o= nly by the individual or entity to whom this message is addressed. If the r= eader of this message is not the intended recipient or an agent or designee= of the intended recipient, please note that any review, use, disclosure or= distribution of this message or its attachments, in any form, is strictly = prohibited. If you have received this message in error, please immediately= notify the sender and/or Notifications@carrieriq.com and delete or destroy= any copy of this message and its attachments.