hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: regionserver loads but never unload?
Date Tue, 09 Mar 2010 02:42:05 GMT
More memory is your friend :-)  Try a heap of 4000mb if you have not yet.

HBase is not particularly suited to individual massive values.  By
large values, I mean a single row/column that has a very large
_value_.  Since we have a write buffer, every single value that passes
thru HBase must reside in memory at least once.

The rest of the system sort of assumes that a single value will fit
into ram, sometimes even several times over.

-ryan


On Mon, Mar 8, 2010 at 6:25 PM, steven zhuang
<steven.zhuang.1984@gmail.com> wrote:
> yeah, I think that's why my poor memory regionserver crashed that many
> times.
> thanks.
>
> On Tue, Mar 9, 2010 at 10:20 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>
>> That is true.  Don't forget that the underlying store file "HFile" is
>> block-oriented, with each block containing 1 or more keys up to the
>> limit of 64k bytes per block.  If the Key and value is > 64k then each
>> block contains exactly 1 key/value.  Thus if your key/value maximal
>> size is 400MB we need 1 400 MB buffer to read from the old file, and 1
>> 400 MB buffer to write to the new compacted file.
>>
>>
>>
>> On Mon, Mar 8, 2010 at 6:08 PM, steven zhuang
>> <steven.zhuang.1984@gmail.com> wrote:
>> > and a store file can be really big, if it is from a region with only one
>> > really big row. :)
>> >
>> > On Tue, Mar 9, 2010 at 9:14 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> Sorry, N should really be "K" - the number of store files being
>> >> compacted.  So it is not dependent on the size of your data set.
>> >>
>> >> -ryan
>> >>
>> >> On Mon, Mar 8, 2010 at 5:12 PM, steven zhuang
>> >> <steven.zhuang.1984@gmail.com> wrote:
>> >> > about the math you did, I think in my case the "N" is really big, my
>> >> largest
>> >> > cell will not exceed 100 Bytes.
>> >> > I am sure the regionserver crashed when it did a major compaction
>> before
>> >> I
>> >> > haven't split the big row into smaller ones.
>> >> >
>> >> >
>> >> > On Tue, Mar 9, 2010 at 8:53 AM, Ryan Rawson <ryanobjc@gmail.com>
>> wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> HBase does not load the "entire region" into ram during region
load.
>> >> >> What it does is load indexes from the storage files.  These indexes
>> >> >> are typically a meg or two per region.  After a region is loaded,
>> >> >> there is no follow up loads.  During the course of answering queries,
>> >> >> as blocks from the files are needed, they are loaded into the block
>> >> >> cache.  At this point they persist until the LRU mechanism decides
to
>> >> >> evict them.
>> >> >>
>> >> >> During compactions, we do a heap sorted merge of multiple HFiles
into
>> >> >> one.  The largest memory use during this would be either the greater
>> >> >> of N*block size (default=64k) or 2*(Largest Cell Size).  Thus
if one
>> >> >> of your cells is 400MB then we would require at least 800MB to
>> compact
>> >> >> such a file.
>> >> >>
>> >> >> -ryan
>> >> >>
>> >> >> On Mon, Mar 8, 2010 at 4:47 PM, steven zhuang
>> >> >> <steven.zhuang.1984@gmail.com> wrote:
>> >> >> > thanks, J.D.
>> >> >> > that's already done after I noticed there are some really
huge
>> rows.
>> >> now
>> >> >> the
>> >> >> > updates and writes can be done smoothly, I kept every row
with 300K
>> >> cells
>> >> >> or
>> >> >> > less.
>> >> >> >
>> >> >> > I am still not clear about how Hbase manage the region, one
thing
>> most
>> >> >> > curious is that will it load a whole region into memory when
there
>> is
>> >> >> some
>> >> >> > read/write/compaction related to the region. i'm checking
the code,
>> >> but
>> >> >> it
>> >> >> > really helps if I could get an answer from you guys.
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Mar 9, 2010 at 2:10 AM, Jean-Daniel Cryans <
>> >> jdcryans@apache.org
>> >> >> >wrote:
>> >> >> >
>> >> >> >> You should consider modeling your rows so that they are
smaller
>> than
>> >> >> >> 1.5GB, the sweet spot for HBase is more like a few KBs
per row.
>> Else
>> >> >> >> you end up with only 1 row per region which is totally
inefficient
>> >> for
>> >> >> >> obvious reasons once you understand how HBase manages
them.
>> >> >> >>
>> >> >> >> The length is the size of the file in bytes.
>> >> >> >>
>> >> >> >> J-D
>> >> >> >>
>> >> >> >> On Sat, Mar 6, 2010 at 9:25 PM, steven zhuang
>> >> >> >> <steven.zhuang.1984@gmail.com> wrote:
>> >> >> >> > thanks, J.D.
>> >> >> >> >
>> >> >> >> >          I think I know why the regionserver
takes so much
>> memory
>> >> now,
>> >> >> >> > there are some really big row in my table, 1.2-1.5
GB in size.
>> >> seems
>> >> >> that
>> >> >> >> > the regionserver sometime will try to load the whole
region into
>> >> >> memory,
>> >> >> >> I
>> >> >> >> > don't know when this will happen, maybe when it does
a major
>> >> >> compaction
>> >> >> >> or
>> >> >> >> > reassign the region to other regionserver or when
it's asked to
>> >> >> >> open/online
>> >> >> >> > a region?.
>> >> >> >> >
>> >> >> >> > you question is answered in line.
>> >> >> >> >
>> >> >> >> > On Sat, Mar 6, 2010 at 2:15 AM, Jean-Daniel Cryans
<
>> >> >> jdcryans@apache.org
>> >> >> >> >wrote:
>> >> >> >> >
>> >> >> >> >> On Thu, Mar 4, 2010 at 7:19 PM, steven zhuang
>> >> >> >> >> <steven.zhuang.1984@gmail.com> wrote:
>> >> >> >> >> > thanks, J.D.
>> >> >> >> >> >
>> >> >> >> >> >              I am still not sure
about the second question,
>> from
>> >> >> the
>> >> >> >> log
>> >> >> >> >> I
>> >> >> >> >> > can see lines like:
>> >> >> >> >> > *org.apache.hadoop.hbase.regionserver.Store:
loaded
>> >> >> >> >> >
>> /user/ccenterq/hbase/XXX/1702600912/queries/1289015788537930719,
>> >> >> >> >> > isReference=false, sequence id=1389720128,
>> length=**175533391**,
>> >> >> >> >> > majorCompaction=true (this is the region
data, not the index,
>> >> >> right?)*
>> >> >> >> >> >              I do have some region
really big, with millions
>> of
>> >> >> >> columns
>> >> >> >> >> in
>> >> >> >> >> > one column family, but isn't this length
a little too big.
>> >> >> >> >>
>> >> >> >> >> The index and the metadata of the files of that
Store in that
>> >> region
>> >> >> >> >> was loaded here.
>> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >        About the third one, I am actually
not very clear of
>> how
>> >> >> memory
>> >> >> >> is
>> >> >> >> >> > used in Hbase, if it's only the few KBs
by holding region
>> info,
>> >> it
>> >> >> >> won't
>> >> >> >> >> > release right?
>> >> >> >> >>
>> >> >> >> >> I don't understand your question. Try an example?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > sorry for not be clear, actually I am asking which
part of the
>> >> region
>> >> >> has
>> >> >> >> a
>> >> >> >> > length of "175533391" in the following line, I think
the
>> >> >> index/meta-data
>> >> >> >> > info for a region won't take so much memory.
>> >> >> >> >
>> >> >> >> >  > *org.apache.hadoop.hbase.regionserver.Store:
loaded
>> >> >> >> >>
>> /user/ccenterq/hbase/XXX/1702600912/queries/1289015788537930719,
>> >> >> >> >> isReference=false, sequence id=1389720128,
>> length=**175533391**,
>> >> >> >> >> majorCompaction=true (this is the region data,
not the index,
>> >> >> right?)*
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Mime
View raw message