Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 31325DF9A for ; Wed, 20 Jun 2012 06:36:06 +0000 (UTC) Received: (qmail 63069 invoked by uid 500); 20 Jun 2012 06:36:04 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 62820 invoked by uid 500); 20 Jun 2012 06:36:03 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 62781 invoked by uid 99); 20 Jun 2012 06:36:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jun 2012 06:36:02 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mcl.hbase@touk.pl designates 212.180.179.38 as permitted sender) Received: from [212.180.179.38] (HELO touk.pl) (212.180.179.38) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jun 2012 06:35:55 +0000 Message-ID: <4FE16F34.4090209@touk.pl> Date: Wed, 20 Jun 2012 08:35:32 +0200 From: Marcin Cylke Organization: TouK User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 To: user@hbase.apache.org Subject: Re: performance of Get from MR Job References: <4FE03A3E.5000502@touk.pl> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 19/06/12 19:31, Jean-Daniel Cryans wrote: > This is a common but hard problem. I do not have a good answer. Thanks for Your writeup. You've given a few suggestions, that I will surely follow. But what is bothering me, is my use of timestamps. As mentioned before, my column family has 2147483646 versions allowed. I store data there using those timestamps - a few rows with the same key but different timestamp. Preparing GETs with timestamp, for TimeRange {0, Timestamp} my performance is slopy (~130/sec). But setting doing sth like {timestamp-10000, timestamp} results in great speed improvement (~400/sec). Despite the {timestamp-10000, timestamp} being unrealistic in my situation, the whole issue seems strange, and thus related in some way to the use of timestamps. Would You recommend trying with complex keys - build of timestamp+my current key? Or this shouldn't change that much? > Finally kind of like Paul said, if you can emit your rows and somehow > batch them reducer-side in order to either do short scans or multi-get > (see HTable.get(List)) it could be faster. I'll try this solution, but I'm not that optimistic about it. I'll let You know whether this helped or not. Regards Marcin