hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject OOME hell
Date Mon, 01 Dec 2008 19:15:34 GMT
I am constantly needing to restart my cluster now, even running region servers with 3GB of
heap. The production cluster is running Hadoop 0.18.1 and HBase 0.18.1

I will see mapred tasks fail with (copied by hand, please forgive):

java.io.IOException: java.lang.OutOfMemoryError: Java heap space
at java.io.DataInputStream.readFull(DataInputSteram.java:175)
at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1933)
at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833)
at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)

This problem is really killing us. When the OOMEs happen, the cluster does not recover without
manual intervention. The regionservers sometimes go down after this, or sometimes do not and
stay up in sick condition for a while. Regions go offline and remain unavailable, causing
indefinite stalls all over the place.

Even so, my workload is modest continuous write operations, maybe up to 100/sec, of objects
typically < 4K in size but can be as large as 20MB. Writes happen to both a 'urls' table
and a 'content' table. 'content' table gets the raw content and uses RECORD compression. 'urls'
table gets metadata only. Concurrent with this are two mapred tasks, one running on the 'urls'
table, one on the 'content' table. The mapred tasks run once every few minutes for a few minutes,
with a interval between executions currently at 5 minutes. 

Along with jgray's import problems, I wonder if there is some issue with writes in general,
or at least in my case, some interaction between the write side of things and the read side
(caching, etc.) One thing I notice every so often is that if I stop the write load on the
cluster then a few moments later a number of compactions and sometimes also splits start running
as if they were being deferred. 

For a while I was doing funky things with store files but I have since reinitialized and am
running with defaults for everything but blockcache (I use blocks of 8192). 

Any thoughts as to what I can do to help the situation?

   - Andy


View raw message