hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@yahoo.com>
Subject Re: OOME hell
Date Mon, 01 Dec 2008 19:44:17 GMT
Thanks Stack. I'll walk over your list of questions and see
if maybe one leads down the correct path!

One thing I can answer right away is that no storefile in
particular seems to be the bullet. It seems to me that after
a while heap pressure builds to a point where the
regionserver falls over, and in a place where the OOME does
not take it down. Indeed I do think that backporting the
OOME handling improvements to 0.18 branch would be helpful. 

Something I will do right away is disable blockcache. It's
use as I can see looking at our code is gratuitous. 

Also, ok based on what you say what I am experiencing is
different from what's happening on jgray's cluster. There is
plenty of available VM and minimal swapping. 

    - Andy


> From: stack <stack@duboce.net>
> Subject: Re: OOME hell
> To: hbase-dev@hadoop.apache.org
> Date: Monday, December 1, 2008, 11:37 AM
> Andrew Purtell wrote:
> > I am constantly needing to restart my cluster now,
> even running region servers with 3GB of heap. The production
> cluster is running Hadoop 0.18.1 and HBase 0.18.1
> > 
> > I will see mapred tasks fail with (copied by hand,
> please forgive):
> > 
> > java.io.IOException: java.lang.OutOfMemoryError: Java
> heap space
> > at
> java.io.DataInputStream.readFull(DataInputSteram.java:175)
> > at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
> > at
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
> > at
> org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1933)
> > at
> org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833)
> > at
> org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
> > at
> org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
> > at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
> >   
> 
> 
> Can you see which store file this is happening against? 
> Does it always OOME against same storefile?  Does it always
> OOME in same place?  Do you think these cells wholesome? 
> Not extremely large?  (thought is that there might be a
> corrupted record that manifests itself as a very large
> record and we OOME trying to read it in to memory to shuttle
> across to the client).  I can make a mapfile checker for you
> if you'd like -- just say.
> 
> > ...
> > 
> > This problem is really killing us. When the OOMEs
> happen, the cluster does not recover without manual
> intervention. The regionservers sometimes go down after
> this, or sometimes do not and stay up in sick condition for
> a while. Regions go offline and remain unavailable, causing
> indefinite stalls all over the place.
> >   
> 
> Is this because the OOMEs are bubbling up in a place that
> doesn't run the release of resevoir memory and trigger
> proper node shutdown?  Should we backport
> hbase-1020/hbase-1006?
> 
> > Even so, my workload is modest continuous write
> operations, maybe up to 100/sec, of objects typically <
> 4K in size but can be as large as 20MB. Writes happen to
> both a 'urls' table and a 'content' table.
> 'content' table gets the raw content and uses RECORD
> compression. 
> 
> I have no experience using compression in HStoreFiles. 
> Running compression buffers may introduce a new uncertainty
> regards memory management (Just guessing -- I have not
> looked).  Have you tried with compression disabled?  Or, is
> it that you cannot disable compression once enabled.
> 
> > 'urls' table gets metadata only. Concurrent
> with this are two mapred tasks, one running on the
> 'urls' table, one on the 'content' table.
> The mapred tasks run once every few minutes for a few
> minutes, with a interval between executions currently at 5
> minutes. 
> > Along with jgray's import problems, 
> 
> These might be something other than OOME issues having
> spent some time studying jgray cluster last wednesday (whole
> cluster went into swap; nothing was working -- GCs
> couldn't complete because of swapping.  OOME was a
> symptom.  Are you seeing any instances of HBASE-616 in your
> logs Andrew?).
> 
> > I wonder if there is some issue with writes in
> general, or at least in my case, some interaction between
> the write side of things and the read side (caching, etc.)
> One thing I notice every so often is that if I stop the
> write load on the cluster then a few moments later a number
> of compactions and sometimes also splits start running as if
> they were being deferred.   
> There could be an issue here.   I can look at log files if
> you put them in a place I can pull.
> > For a while I was doing funky things with store files
> but I have since reinitialized and am running with defaults
> for everything but blockcache (I use blocks of 8192).   
> 
> You need blockcache?  Blockcache uses soft references. 
> Will fill until memory pressure and only then will it dump
> items.  Might help if you disable this.
> 
> What version of the JVM are you using?
> 
> St.Ack


      

Mime
View raw message