Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@locus.apache.org Received: (qmail 51106 invoked from network); 22 Jan 2009 20:06:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Jan 2009 20:06:12 -0000 Received: (qmail 55654 invoked by uid 500); 22 Jan 2009 20:06:12 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 55328 invoked by uid 500); 22 Jan 2009 20:06:11 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 55317 invoked by uid 99); 22 Jan 2009 20:06:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2009 12:06:11 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [63.203.238.117] (HELO dns.duboce.net) (63.203.238.117) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2009 20:06:03 +0000 Received: by dns.duboce.net (Postfix, from userid 1008) id 6912FC563; Thu, 22 Jan 2009 10:37:49 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-26) on dns.duboce.net X-Spam-Level: Received: from durruti.desk.hq.powerset.com (durruti.desk.hq.powerset.com [208.84.6.21]) by dns.duboce.net (Postfix) with ESMTP id 2754FC51B for ; Thu, 22 Jan 2009 10:37:45 -0800 (PST) Message-ID: <4978D192.60504@duboce.net> Date: Thu, 22 Jan 2009 12:05:38 -0800 From: stack User-Agent: Thunderbird 2.0.0.19 (Macintosh/20081209) MIME-Version: 1.0 To: hbase-user@hadoop.apache.org Subject: Re: HBase random read technics References: <013101c97c9c$85024270$8f06c750$@com> <4978B104.5090105@duboce.net> <9683564c0901221144t455e002blf4f1d618768e024b@mail.gmail.com> In-Reply-To: <9683564c0901221144t455e002blf4f1d618768e024b@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-3.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.4 Genady Gillin wrote: > Hi, > > We use HBase 0.19Rc2, our data(~800GB) resides in one table( is it bad?), > schema of table is pretty simple - it's two column families, one is keys and > second is value, each key could have one or more values(~100). Keys in one column family and values in another? Why not both in the one column family? You use the keys in first column family to do lookups into the second? > To query > values used some file with keys(for instance about 10M keys), so the purpose > is to read all values for each one of keys, where expected performance is > about 1 hour. By the way data output is not too big ~2Gb. > Can you sort the keys and then start a scanner with perhaps start and stop keys being first and last from file? Does that run faster? But sounds like you need to run an MR job. You tried that and it failed. You tried on same hardware? My guess is your were running into the issue we're discussing in other email ('.... slept too long...'). St.Ack > Thanks, > Gennady > > > > On Thu, Jan 22, 2009 at 7:46 PM, stack wrote: > > >> Genady wrote: >> >> >>> Hi, >>> >>> >>> Just wondering if somebody could recommend a random read strategy for >>> searching a big group of the keys(100M) in hadoop/hbase cluster, using one >>> client is very slow, separating an input to smaller groups and running >>> each >>> one with a different client is certainly improves performance, but maximum >>> speed I'm getting is ~3300 read/sec. I've tried to use map reduce and to >>> run >>> search as map-reduce ask and to run HBase reads from map or reduce, but >>> HBase is start to fail. So hardware upgrade and creating HBase in memory >>> tables is only direction here? >>> >>> >>> >>> >> Tell us more about your table schema, data sizes, and the types of query. >> What performance do you need from hbase? Do your rows have many columns >> and you are trying to get all columns when you query for example? Are you >> on 0.19.0 Genady (sorry if you've answered this question in the near past)? >> St.Ack >> >> > >