Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of saint.ack@gmail.com
 designates 74.125.92.26 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:in-reply-to:references:date
         :x-google-sender-auth:message-id:subject:from:to:content-type;
        b=M6NqrQJWqb6FgHvUFSo11b77TQSfUVs5aZrWateksopsT0cgawWs8OqHfln2S69ckZ
         +WDYTZFr7xbmGz9FkLruIsUIlYaxlrdp7VPxUfSXPnI9MHe9F2bkbVf76o3ba7Wa8VCT
         S+rPjaUKep8uYadbV9bhh6t2GGZ4ReKaTWq1c=
MIME-Version: 1.0
Sender: saint.ack@gmail.com
In-Reply-To: <4A8AE4C4.8020209@streamy.com>
References: <fa03480d0908171402r77f05e1ale62782dfcd2e81c7@mail.gmail.com>
	 <fa03480d0908180302k40a92e08w92f21580bef11457@mail.gmail.com>
	 <fa03480d0908180407u6022d97an1569afa620f5ab01@mail.gmail.com>
	 <4A8AE4C4.8020209@streamy.com>
Date: Tue, 18 Aug 2009 10:36:00 -0700
Message-ID: <7c962aed0908181036m4339160t23b3c5e4f97b1680@mail.gmail.com>
Subject: Re: HBase-0.20.0 Performance Evaluation
From: stack <stack@duboce.net>
To: hbase-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=002354471070a8402204716df3a2

--002354471070a8402204716df3a2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

What do you have for GC config Schubert?  Now its 8ms a random read?
St.Ack

On Tue, Aug 18, 2009 at 10:28 AM, Jonathan Gray <jlist@streamy.com> wrote:

> Schubert,
>
> I can't think of any reason your random reads would get slower after
> inserting more data, besides GC issues.
>
> Do you have GC logging and JVM metrics logging turned on?  I would inspect
> those to see if you have any long-running GC pauses, or just lots and lots
> of GC going on.
>
> If I recall, you are running on 4GB nodes, 2GB RS heap, and cohosted
> DataNodes and TaskTrackers.  We ran for a long time on a similar setup, but
> once we moved to 0.20 (and to the CMS garbage collector), we really needed
> to add more memory to the nodes and increase RS heap to 4 or 5GB.  The CMS
> GC is less efficient in memory, but if given sufficient resources, is much
> better for overall performance/throughput.
>
> Also, do you have Ganglia setup?  Are you seeing swapping on your RS nodes?
>  Is there high IO-wait CPU usage?
>
> JG
>
>
> Schubert Zhang wrote:
>
>> Addition.
>> Only random-reads become very slow, scans and sequential-reads are ok.
>>
>>
>> On Tue, Aug 18, 2009 at 6:02 PM, Schubert Zhang <zsongbo@gmail.com>
>> wrote:
>>
>>  stack and J-G, Thank you very much for your helpful comment.
>>>
>>> But now, we find such a critical issue for random reads.
>>> I use sequentical-writes to insert 5GB of data in our HBase table from
>>> empty, and ~30 regions are generated. Then the random-reads takes about
>>> 30
>>> minutes to complete. And then, I run the sequentical-writes again. Thus,
>>> another version of each cell are inserted, thus ~60 regions are
>>> generated.
>>> But, we I ran the random-reads again to this table, it always take long
>>> time
>>> (more than 2 hours).
>>>
>>> I check the heap usage and other metrics, does not find the reason.
>>>
>>> Bellow is the status of one region server:
>>> request=0.0, regions=13, stores=13, storefiles=14, storefileIndexSize=2,
>>> memstoreSize=0, usedHeap=1126, maxHeap=1991, blockCacheSize=338001080,
>>> blockCacheFree=79686056, blockCacheCount=5014, blockCacheHitRatio=55
>>>
>>> Schubert
>>>
>>>
>>> On Tue, Aug 18, 2009 at 5:02 AM, Schubert Zhang <zsongbo@gmail.com>
>>> wrote:
>>>
>>>  We have just done a Performance Evaluation on HBase-0.20.0.
>>>> Refers to:
>>>>
>>>> http://docloud.blogspot.com/2009/08/hbase-0200-performance-evaluation.html
>>>>
>>>>
>>>
>>

--002354471070a8402204716df3a2--