hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Seigal <selek...@yahoo.com>
Subject Re: HBase region size
Date Fri, 01 Jul 2011 19:21:18 GMT
On Thu, Jun 30, 2011 at 11:33 PM, Stack <stack@duboce.net> wrote:

> On Mon, Jun 27, 2011 at 11:37 PM, Aditya Karanth A
> <aditya_karanth@mindtree.com> wrote:
> >> I have heard that bigger the size of the regionserver, more time it
> takes
> >> for region splitting and slower the reads are. Is this true?
> > (I have not been able to experiment with all these in our environments
> yet,
> > but if anyone has been there and done that, would be good to know)
> >

> Well, splitting is fast in that it just writes out references files;
> it does not actually rewrite data so size shouldn't matter.

This is interesting.  I always thought of a region split as a single file
being copied out as two. Is there more documentation on this ? If not, what
code can I look at to better understand splits ?
Also, when a region is moved from one regionserver to another, doesn't that
need to move the data local to the new regionserver for better performance
and reducing I/O ?

> Scan reads don't care about file size (bigger may actually be slightly
> faster).  Random read performance also is unrelated to file/region
> size (We consult the in-memory index to figure where to jump to to
> start the read -- this should be the same for big or small files).

If this is true, when will you ever want to have multiple regions for the
same table and column family being served by a single regionserver ? I'd
rather then keep the region size to unlimited, and if the region gets hot,
manually split and move ? Any risk associated with this approach ? I guess
this ties into my previous question, if during a move, a lot of data is
physically moved from one location to another, you probably do not want to
run into a situation where you are moving very large regions around in the
cluster at the same time ...

> St.Ack

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message