Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <4978D192.60504@duboce.net>
References: <013101c97c9c$85024270$8f06c750$@com>
 <4978B104.5090105@duboce.net>
	 <9683564c0901221144t455e002blf4f1d618768e024b@mail.gmail.com>
	 <4978D192.60504@duboce.net>
Date: Thu, 22 Jan 2009 22:38:34 +0200
Message-ID: <9683564c0901221238h168f40eboe9b011a03db88bef@mail.gmail.com>
Subject: Re: HBase random read technics
From: Genady Gillin <genadyg@exelate.com>
To: hbase-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=000e0cd23f0e94422e04611841a2

--000e0cd23f0e94422e04611841a2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Hi,*

**On Thu, Jan 22, 2009 at 10:05 PM, stack <stack@duboce.net> wrote:
*
>
> *Genady Gillin wrote:
> *
>>
>> * Hi,
>>
>> We use HBase 0.19Rc2, our data(~800GB) resides in one table( is it bad?),
>> schema of table is pretty simple - it's two column families, one is keys
>> and
>> second is value, each key could have one or more values(~100).
>> *
>
> * Keys in one column family and values in another?  Why not both in the
> one column family?*


Because each key could have one or more values.


> *
>
> You use the keys in first column family to do lookups into the second?**Can
> you sort the keys and then start a scanner with perhaps start and stop keys
> being first and last from file?  Does that run faster?*


Keys that i want to read are sorted but not sequential, so scan here
useless.

> *
>
> But sounds like you need to run an MR job.  You tried that and it failed.
>  You tried on same hardware?  My guess is your were running into the issue
> we're discussing in other email ('.... slept too long...').*


Not fair to use inside info :) Hardware performance could be an issue, we're
going to upgrade hardware as result of your assistance, so I'll try to run
MR job on a new system

Thanks,
Gennady
.

> *
>
> St.Ack* *
>
>
>
> *
>>
>> * Thanks,
>> Gennady
>>
>>
>>
>> On Thu, Jan 22, 2009 at 7:46 PM, stack <stack@duboce.net> wrote:
>>
>>
>> *
>>>
>>> * Genady wrote:
>>>
>>>
>>> *
>>>>
>>>> * Hi,
>>>>
>>>>
>>>> Just wondering if somebody could recommend a random read strategy for
>>>> searching a big group of the keys(100M) in hadoop/hbase cluster, using
>>>> one
>>>> client is very slow, separating an input to smaller groups and running
>>>> each
>>>> one with a different client is certainly improves performance, but
>>>> maximum
>>>> speed I'm getting is ~3300 read/sec. I've tried to use map reduce and to
>>>> run
>>>> search as map-reduce ask and to run HBase reads from map or reduce, but
>>>> HBase is start to fail. So hardware upgrade and creating HBase in memory
>>>> tables is only direction here?
>>>>
>>>>
>>>>
>>>>
>>>> *
>>>
>>> * Tell us more about your table schema, data sizes, and the types of
>>> query.
>>>  What performance do you need from hbase?  Do your rows have many columns
>>> and you are trying to get all columns when you query for example?  Are
>>> you
>>> on 0.19.0 Genady (sorry if you've answered this question in the near
>>> past)?
>>> St.Ack
>>>
>>>
>>> *
>>
>> *
>>
>> *
>
> *
> *
>
*
*

--000e0cd23f0e94422e04611841a2--