Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@locus.apache.org Received: (qmail 60901 invoked from network); 22 Jan 2009 20:39:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Jan 2009 20:39:06 -0000 Received: (qmail 8306 invoked by uid 500); 22 Jan 2009 20:39:05 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 8135 invoked by uid 500); 22 Jan 2009 20:39:05 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 8124 invoked by uid 99); 22 Jan 2009 20:39:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2009 12:39:05 -0800 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.44.29] (HELO yx-out-2324.google.com) (74.125.44.29) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2009 20:38:56 +0000 Received: by yx-out-2324.google.com with SMTP id 31so1883841yxl.29 for ; Thu, 22 Jan 2009 12:38:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.142.139.5 with SMTP id m5mr332087wfd.237.1232656714186; Thu, 22 Jan 2009 12:38:34 -0800 (PST) In-Reply-To: <4978D192.60504@duboce.net> References: <013101c97c9c$85024270$8f06c750$@com> <4978B104.5090105@duboce.net> <9683564c0901221144t455e002blf4f1d618768e024b@mail.gmail.com> <4978D192.60504@duboce.net> Date: Thu, 22 Jan 2009 22:38:34 +0200 Message-ID: <9683564c0901221238h168f40eboe9b011a03db88bef@mail.gmail.com> Subject: Re: HBase random read technics From: Genady Gillin To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0cd23f0e94422e04611841a2 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd23f0e94422e04611841a2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi,* **On Thu, Jan 22, 2009 at 10:05 PM, stack wrote: * > > *Genady Gillin wrote: > * >> >> * Hi, >> >> We use HBase 0.19Rc2, our data(~800GB) resides in one table( is it bad?), >> schema of table is pretty simple - it's two column families, one is keys >> and >> second is value, each key could have one or more values(~100). >> * > > * Keys in one column family and values in another? Why not both in the > one column family?* Because each key could have one or more values. > * > > You use the keys in first column family to do lookups into the second?**Can > you sort the keys and then start a scanner with perhaps start and stop keys > being first and last from file? Does that run faster?* Keys that i want to read are sorted but not sequential, so scan here useless. > * > > But sounds like you need to run an MR job. You tried that and it failed. > You tried on same hardware? My guess is your were running into the issue > we're discussing in other email ('.... slept too long...').* Not fair to use inside info :) Hardware performance could be an issue, we're going to upgrade hardware as result of your assistance, so I'll try to run MR job on a new system Thanks, Gennady . > * > > St.Ack* * > > > > * >> >> * Thanks, >> Gennady >> >> >> >> On Thu, Jan 22, 2009 at 7:46 PM, stack wrote: >> >> >> * >>> >>> * Genady wrote: >>> >>> >>> * >>>> >>>> * Hi, >>>> >>>> >>>> Just wondering if somebody could recommend a random read strategy for >>>> searching a big group of the keys(100M) in hadoop/hbase cluster, using >>>> one >>>> client is very slow, separating an input to smaller groups and running >>>> each >>>> one with a different client is certainly improves performance, but >>>> maximum >>>> speed I'm getting is ~3300 read/sec. I've tried to use map reduce and to >>>> run >>>> search as map-reduce ask and to run HBase reads from map or reduce, but >>>> HBase is start to fail. So hardware upgrade and creating HBase in memory >>>> tables is only direction here? >>>> >>>> >>>> >>>> >>>> * >>> >>> * Tell us more about your table schema, data sizes, and the types of >>> query. >>> What performance do you need from hbase? Do your rows have many columns >>> and you are trying to get all columns when you query for example? Are >>> you >>> on 0.19.0 Genady (sorry if you've answered this question in the near >>> past)? >>> St.Ack >>> >>> >>> * >> >> * >> >> * > > * > * > * * --000e0cd23f0e94422e04611841a2--