hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From steven zhuang <steven.zhuang.1...@gmail.com>
Subject Re: regionserver loads but never unload?
Date Tue, 09 Mar 2010 02:08:28 GMT
and a store file can be really big, if it is from a region with only one
really big row. :)

On Tue, Mar 9, 2010 at 9:14 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> Hi,
>
> Sorry, N should really be "K" - the number of store files being
> compacted.  So it is not dependent on the size of your data set.
>
> -ryan
>
> On Mon, Mar 8, 2010 at 5:12 PM, steven zhuang
> <steven.zhuang.1984@gmail.com> wrote:
> > about the math you did, I think in my case the "N" is really big, my
> largest
> > cell will not exceed 100 Bytes.
> > I am sure the regionserver crashed when it did a major compaction before
> I
> > haven't split the big row into smaller ones.
> >
> >
> > On Tue, Mar 9, 2010 at 8:53 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> HBase does not load the "entire region" into ram during region load.
> >> What it does is load indexes from the storage files.  These indexes
> >> are typically a meg or two per region.  After a region is loaded,
> >> there is no follow up loads.  During the course of answering queries,
> >> as blocks from the files are needed, they are loaded into the block
> >> cache.  At this point they persist until the LRU mechanism decides to
> >> evict them.
> >>
> >> During compactions, we do a heap sorted merge of multiple HFiles into
> >> one.  The largest memory use during this would be either the greater
> >> of N*block size (default=64k) or 2*(Largest Cell Size).  Thus if one
> >> of your cells is 400MB then we would require at least 800MB to compact
> >> such a file.
> >>
> >> -ryan
> >>
> >> On Mon, Mar 8, 2010 at 4:47 PM, steven zhuang
> >> <steven.zhuang.1984@gmail.com> wrote:
> >> > thanks, J.D.
> >> > that's already done after I noticed there are some really huge rows.
> now
> >> the
> >> > updates and writes can be done smoothly, I kept every row with 300K
> cells
> >> or
> >> > less.
> >> >
> >> > I am still not clear about how Hbase manage the region, one thing most
> >> > curious is that will it load a whole region into memory when there is
> >> some
> >> > read/write/compaction related to the region. i'm checking the code,
> but
> >> it
> >> > really helps if I could get an answer from you guys.
> >> >
> >> >
> >> > On Tue, Mar 9, 2010 at 2:10 AM, Jean-Daniel Cryans <
> jdcryans@apache.org
> >> >wrote:
> >> >
> >> >> You should consider modeling your rows so that they are smaller than
> >> >> 1.5GB, the sweet spot for HBase is more like a few KBs per row. Else
> >> >> you end up with only 1 row per region which is totally inefficient
> for
> >> >> obvious reasons once you understand how HBase manages them.
> >> >>
> >> >> The length is the size of the file in bytes.
> >> >>
> >> >> J-D
> >> >>
> >> >> On Sat, Mar 6, 2010 at 9:25 PM, steven zhuang
> >> >> <steven.zhuang.1984@gmail.com> wrote:
> >> >> > thanks, J.D.
> >> >> >
> >> >> >          I think I know why the regionserver takes so much memory
> now,
> >> >> > there are some really big row in my table, 1.2-1.5 GB in size.
> seems
> >> that
> >> >> > the regionserver sometime will try to load the whole region into
> >> memory,
> >> >> I
> >> >> > don't know when this will happen, maybe when it does a major
> >> compaction
> >> >> or
> >> >> > reassign the region to other regionserver or when it's asked to
> >> >> open/online
> >> >> > a region?.
> >> >> >
> >> >> > you question is answered in line.
> >> >> >
> >> >> > On Sat, Mar 6, 2010 at 2:15 AM, Jean-Daniel Cryans <
> >> jdcryans@apache.org
> >> >> >wrote:
> >> >> >
> >> >> >> On Thu, Mar 4, 2010 at 7:19 PM, steven zhuang
> >> >> >> <steven.zhuang.1984@gmail.com> wrote:
> >> >> >> > thanks, J.D.
> >> >> >> >
> >> >> >> >              I am still not sure about the second question,
from
> >> the
> >> >> log
> >> >> >> I
> >> >> >> > can see lines like:
> >> >> >> > *org.apache.hadoop.hbase.regionserver.Store: loaded
> >> >> >> > /user/ccenterq/hbase/XXX/1702600912/queries/1289015788537930719,
> >> >> >> > isReference=false, sequence id=1389720128, length=**175533391**,
> >> >> >> > majorCompaction=true (this is the region data, not the
index,
> >> right?)*
> >> >> >> >              I do have some region really big, with millions
of
> >> >> columns
> >> >> >> in
> >> >> >> > one column family, but isn't this length a little too
big.
> >> >> >>
> >> >> >> The index and the metadata of the files of that Store in that
> region
> >> >> >> was loaded here.
> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >        About the third one, I am actually not very clear
of how
> >> memory
> >> >> is
> >> >> >> > used in Hbase, if it's only the few KBs by holding region
info,
> it
> >> >> won't
> >> >> >> > release right?
> >> >> >>
> >> >> >> I don't understand your question. Try an example?
> >> >> >
> >> >> >
> >> >> > sorry for not be clear, actually I am asking which part of the
> region
> >> has
> >> >> a
> >> >> > length of "175533391" in the following line, I think the
> >> index/meta-data
> >> >> > info for a region won't take so much memory.
> >> >> >
> >> >> >  > *org.apache.hadoop.hbase.regionserver.Store: loaded
> >> >> >> /user/ccenterq/hbase/XXX/1702600912/queries/1289015788537930719,
> >> >> >> isReference=false, sequence id=1389720128, length=**175533391**,
> >> >> >> majorCompaction=true (this is the region data, not the index,
> >> right?)*
> >> >>
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message