Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D523F10648 for ; Thu, 2 Jan 2014 21:43:45 +0000 (UTC) Received: (qmail 1905 invoked by uid 500); 2 Jan 2014 21:43:43 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 1845 invoked by uid 500); 2 Jan 2014 21:43:43 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 1837 invoked by uid 99); 2 Jan 2014 21:43:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jan 2014 21:43:43 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sergey@hortonworks.com designates 209.85.216.171 as permitted sender) Received: from [209.85.216.171] (HELO mail-qc0-f171.google.com) (209.85.216.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jan 2014 21:43:37 +0000 Received: by mail-qc0-f171.google.com with SMTP id c9so14003894qcz.2 for ; Thu, 02 Jan 2014 13:43:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=V8ObLaiKpGvZlbXdSr+pTc4L+RiqWdOAd8O86zjpC9k=; b=gMSiJBNIbO19j2N3n/wj+q7OdOPHR1dvBXu5mplsTmkySRxgWk73MiE0I9BIDrzqub udmwjx+izrXOk20HXn7AKcUrDEs448bB64bHcO52ipz7XmVicJVOercrdnHtjZHi7nVS 84SRM4bfSxWUB5EhsrWQY7RZrK01AfVcVcoLb4wnyUxy3y6fpxMkQqgZ+rJb6ozD06Be Oh9U1S8JmS+4kjxsBYt9T+SGHJeDWk/bgX5Tu2aXCOfo+W9FdHEFPMIWmdASlK5cbT+X KzgIDAPfJX7ciG6ojQ19AQbKfgpyt2JAguyfhQv49TZoyc4ntJZ2YClWrJ+fyxn6vMBB qZXg== X-Gm-Message-State: ALoCoQmV2UMsG/QgBky1Y4yjo72H+KHYjysCakHvqPEo3kbjgbMjG0IcDdia4G9q7kHxyFTyaKFvB2Blq1Huthm4ojVnOxdUzJ7ZFL1WzysCecanWZXIY54= MIME-Version: 1.0 X-Received: by 10.49.131.164 with SMTP id on4mr147268280qeb.16.1388698996705; Thu, 02 Jan 2014 13:43:16 -0800 (PST) Received: by 10.140.37.243 with HTTP; Thu, 2 Jan 2014 13:43:16 -0800 (PST) In-Reply-To: References: Date: Thu, 2 Jan 2014 13:43:16 -0800 Message-ID: Subject: Re: Performance between HBaseClient scan and HFileReaderV2 From: Sergey Shelukhin To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=047d7bd75132660f9d04ef03ae50 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bd75132660f9d04ef03ae50 Content-Type: text/plain; charset=US-ASCII Er, using MR over snapshots, which reads files directly... https://issues.apache.org/jira/browse/HBASE-8369 However, it was only committed to 98. There was interest in 94 port (HBASE-10076), but it never happened... On Thu, Jan 2, 2014 at 1:42 PM, Sergey Shelukhin wrote: > You might be interested in using > https://issues.apache.org/jira/browse/HBASE-8369 > However, it was only committed to 98. > There was interest in 94 port (HBASE-10076), but it never happened... > > > On Thu, Jan 2, 2014 at 1:32 PM, Jerry Lam wrote: > >> Hello Vladimir, >> >> In my use case, I guarantee that a major compaction is executed before any >> scan happens because the system we build is a read only system. There will >> have no deleted cells. Additionally, I only need to read from a single >> column family and therefore I don't need to access multiple HFiles. >> >> Filter conditions are nice to have because if I can read HFile 8x faster >> than using HBaseClient, I can do the filter on the client side and still >> perform faster than using HBaseClient. >> >> Thank you for your input! >> >> Jerry >> >> >> >> On Thu, Jan 2, 2014 at 1:30 PM, Vladimir Rodionov >> wrote: >> >> > HBase scanner MUST guarantee correct order of KeyValues (coming from >> > different HFile's), >> > filter condition+ filter condition on included column families and >> > qualifiers, time range, max versions and correctly process deleted >> cells. >> > Direct HFileReader does nothing from the above list. >> > >> > Best regards, >> > Vladimir Rodionov >> > Principal Platform Engineer >> > Carrier IQ, www.carrieriq.com >> > e-mail: vrodionov@carrieriq.com >> > >> > ________________________________________ >> > From: Jerry Lam [chilinglam@gmail.com] >> > Sent: Thursday, January 02, 2014 7:56 AM >> > To: user >> > Subject: Re: Performance between HBaseClient scan and HFileReaderV2 >> > >> > Hi Tom, >> > >> > Good point. Note that I also ran the HBaseClient performance test >> several >> > times (as you can see from the chart). The caching should also benefit >> the >> > second time I ran the HBaseClient performance test not just benefitting >> the >> > HFileReaderV2 test. >> > >> > I still don't understand what makes the HBaseClient performs so poorly >> in >> > comparison to access directly HDFS. I can understand maybe a factor of 2 >> > (even that it is too much) but a factor of 8 is quite unreasonable. >> > >> > Any hint? >> > >> > Jerry >> > >> > >> > >> > On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood wrote: >> > >> > > I'm also new to HBase and am not familiar with HFileReaderV2. >> However, >> > in >> > > your description, you didn't mention anything about clearing the >> linux OS >> > > cache between tests. That might be why you're seeing the big >> difference >> > if >> > > you ran the HBaseClient test first, it may have warmed the OS cache >> and >> > > then HFileReaderV2 benefited from it. Just a guess... >> > > >> > > -- Tom >> > > >> > > >> > > >> > > On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam >> > wrote: >> > > >> > > > Hello HBase users, >> > > > >> > > > I just ran a very simple performance test and would like to see if >> > what I >> > > > experienced make sense. >> > > > >> > > > The experiment is as follows: >> > > > - I filled a hbase region with 700MB data (each row has roughly 45 >> > > columns >> > > > and the size is 20KB for the entire row) >> > > > - I configured the region to hold 4GB (therefore no split occurs) >> > > > - I ran compactions after the data is loaded and make sure that >> there >> > is >> > > > only 1 region in the table under test. >> > > > - No other table exists in the hbase cluster because this is a DEV >> > > > environment >> > > > - I'm using HBase 0.92.1 >> > > > >> > > > The test is very basic. I use HBaseClient to scan the entire region >> to >> > > > retrieve all rows and all columns in the table, just iterating all >> > > KeyValue >> > > > pairs until it is done. It took about 1 minute 22 sec to complete. >> > (Note >> > > > that I disable block cache and uses caching size about 10000). >> > > > >> > > > I ran another test using HFileReaderV2 and scan the entire region to >> > > > retrieve all rows and all columns, just iterating all keyValue pairs >> > > until >> > > > it is done. It took 11 sec. >> > > > >> > > > The performance difference is dramatic (almost 8 times faster using >> > > > HFileReaderV2). >> > > > >> > > > I want to know why the difference is so big or I didn't configure >> HBase >> > > > properly. From this experiment, HDFS can deliver the data >> efficiently >> > so >> > > it >> > > > is not the bottleneck. >> > > > >> > > > Any help is appreciated! >> > > > >> > > > Jerry >> > > > >> > > > >> > > >> > >> > Confidentiality Notice: The information contained in this message, >> > including any attachments hereto, may be confidential and is intended >> to be >> > read only by the individual or entity to whom this message is >> addressed. If >> > the reader of this message is not the intended recipient or an agent or >> > designee of the intended recipient, please note that any review, use, >> > disclosure or distribution of this message or its attachments, in any >> form, >> > is strictly prohibited. If you have received this message in error, >> please >> > immediately notify the sender and/or Notifications@carrieriq.com and >> > delete or destroy any copy of this message and its attachments. >> > >> > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. --047d7bd75132660f9d04ef03ae50--