Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of koven2049@gmail.com designates
 209.85.160.169 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADcMMgFFPUVq_CAhUdnpS0QKmTiyMv9He7iWoXaM0as1+wng4A@mail.gmail.com>
References: 
 <CAHoGh0kGms=7YUWfD=hEkKwuA+qWSOQ3_G5-+OeDnyPFjonKww@mail.gmail.com>
	<CADcMMgFFPUVq_CAhUdnpS0QKmTiyMv9He7iWoXaM0as1+wng4A@mail.gmail.com>
Date: Fri, 15 Jul 2011 11:41:06 +0800
Message-ID: 
 <CAHoGh0mKhWYYzCxbBQbL7zuiD0++P+n5LGhRum9FkENabKX76A@mail.gmail.com>
Subject: Re: performance problem during read
From: Mingjian Deng <koven2049@gmail.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=000325553aae64cb0604a8136bd1

--000325553aae64cb0604a8136bd1
Content-Type: text/plain; charset=ISO-8859-1

Hi stack:
    Server A or B is the same in the cluster. If I set
hfile.block.cache.size=0.1
on other server, the problem will reappear.But When I set
hfile.block.cache.size = 0.15 or more, it won't reappear. So I think you can
test on your own cluster.
    With the follow btrace codes:
--------------------------------------------------------------
import static com.sun.btrace.BTraceUtils.*;
import com.sun.btrace.annotations.*;

import java.nio.ByteBuffer;
import org.apache.hadoop.hbase.io.hfile.*;

@BTrace public class TestRegion1{
   @OnMethod(
      clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader",
      method="decompress"
   )
   public static void traceCacheBlock(final long offset, final int
compressedSize,
      final int decompressedSize, final boolean pread){
println(strcat("decompress: ",str(decompressedSize)));
   }
}
--------------------------------------------------------------

    If I set hfile.block.cache.size=0.1, the result is:
-----------
.......
decompress: 6020488
decompress: 6022536
decompress: 5991304
decompress: 6283272
decompress: 5957896
decompress: 6246280
decompress: 6041096
decompress: 6541448
decompress: 6039560
.......
-----------
    If I set hfile.block.cache.size=0.12, the result is:
-----------
......
decompress: 65775
decompress: 65556
decompress: 65552
decompress: 9914120
decompress: 6026888
decompress: 65615
decompress: 65627
decompress: 6247944
decompress: 5880840
decompress: 65646
......
-----------
    If I set hfile.block.cache.size=0.15 or more, the result is:
-----------
......
decompress: 65646
decompress: 65615
decompress: 65627
decompress: 65775
decompress: 65556
decompress: 65552
decompress: 65646
decompress: 65615
decompress: 65627
decompress: 65775
decompress: 65556
decompress: 65552
......
-----------

    All of above tests run more than 10 minutes in high level read speed. So
it is very strange phenomenon.

2011/7/15 Stack <stack@duboce.net>

> This is interesting.  Any chance that the cells on the regions hosted
> on server A are 5M in size?
>
> The hfile block sizes are by default configured to be 64k but rare
> would an hfile block ever be exactly 64k.  We do not cut the hfile
> block content at 64k exactly.  The hfile block boundary will be at a
> keyvalue boundary.
>
> If a cell were 5MB, it does not get split across multiple hfile
> blocks.  It will occupy one hfile block.
>
> Could it be that the region hosted on A is not like the others and it
> has lots of these 5MB sizes?
>
> Let us know.  If above is not the case, then you have an interesting
> phenomenon going on and we need to dig in more.
>
> St.Ack
>
>
> On Thu, Jul 14, 2011 at 5:27 AM, Mingjian Deng <koven2049@gmail.com>
> wrote:
> > Hi:
> >    we found a strange problem in our read test.
> >    It is a 5 nodes cluster.Four of our 5 regionservers
> > set hfile.block.cache.size=0.4, one of them is 0.1(node A). When we
> random
> > read from a 2TB data table we found node A's network reached 100MB, and
> > others are less than 10MB. We kown node A need to read data from disks
> and
> > put them in blockcache. In the follow codes in LruBlockCache:
> >
> --------------------------------------------------------------------------------------------------------------------------
> >  public void cacheBlock(String blockName, ByteBuffer buf, boolean
> inMemory)
> > {
> >    CachedBlock cb = map.get(blockName);
> >    if(cb != null) {
> >      throw new RuntimeException("Cached an already cached block");
> >    }
> >    cb = new CachedBlock(blockName, buf, count.incrementAndGet(),
> inMemory);
> >    long newSize = size.addAndGet(cb.heapSize());
> >    map.put(blockName, cb);
> >    elements.incrementAndGet();
> >    if(newSize > acceptableSize() && !evictionInProgress) {
> >      runEviction();
> >    }
> >  }
> >
> --------------------------------------------------------------------------------------------------------------------------
> >
> >
> >
> >
> >  We debugged this code with btrace like follow code:
> >
> --------------------------------------------------------------------------------------------------------------------------
> > import static com.sun.btrace.BTraceUtils.*;
> > import com.sun.btrace.annotations.*;
> >
> > import java.nio.ByteBuffer;
> > import org.apache.hadoop.hbase.io.hfile.*;
> >
> > @BTrace public class TestRegion{
> >   @OnMethod(
> >      clazz="org.apache.hadoop.hbase.io.hfile.LruBlockCache",
> >      method="cacheBlock"
> >   )
> >   public static void traceCacheBlock(@Self LruBlockCache instance,String
> > blockName, ByteBuffer buf, boolean inMemory){
> >     println(strcat("size:
> >
> ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","size"),instance))));
> >     println(strcat("elements:
> >
> ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","elements"),instance))));
> >   }
> > }
> >
> --------------------------------------------------------------------------------------------------------------------------
> >
> >
> >
> >  We found that the "size" increace 5 MB each time in node A! Why not 64
> KB
> > each time?? But the "size" increace 64 KB when we run this btrace code in
> > other nodes at the same time.
> >
> >  The follow codes also confirm the problem because the "decompressedSize"
> > is 5 MB each time in node A!
> >
> -------------------------------------------------------------------------------------------------------------------------
> > import static com.sun.btrace.BTraceUtils.*;
> > import com.sun.btrace.annotations.*;
> >
> > import java.nio.ByteBuffer;
> > import org.apache.hadoop.hbase.io.hfile.*;
> >
> > @BTrace public class TestRegion1{
> >   @OnMethod(
> >      clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader",
> >      method="decompress"
> >   )
> >   public static void traceCacheBlock(final long offset, final int
> > compressedSize,
> >      final int decompressedSize, final boolean pread){
> > println(strcat("decompressedSize: ",str(decompressedSize)));
> >   }
> > }
> >
> -------------------------------------------------------------------------------------------------------------------------
> >
> >
> >
> >  Why not 64 KB?
> >
> >  BTW: When we set hfile.block.cache.size=0.4 in node A, the
> > "decompressedSize" down to 64 KB, and the tps is up to high level.
> >
>

--000325553aae64cb0604a8136bd1--