hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Regionservers crash with an OutOfMemoryException after a data-intensive map reduce job..
Date Thu, 13 May 2010 18:05:00 GMT
Hello Vidhyashankar:

How many regionservers?   What version of hbase and hadoop?  How much
RAM on these machines in total?  Can you give HBase more RAM?

Also check that you don't have an exceptional cell in your input --
one that is very much larger than the 14KB you not below.

12 column families is at the extreme regards what we've played with,
just FYI.  You might try with a schema that has less: e.g. one CF for
the big cell value and all others into the second CF.

There may also be corruption in one of the storefiles given that the
OOME below seems to happen when we try and open a region (but the fact
of opening may have no relation to why the OOME).

St.Ack


On Thu, May 13, 2010 at 10:35 AM, Vidhyashankar Venkataraman
<vidhyash@yahoo-inc.com> wrote:
> This is similar to a mail sent by another user to the group a couple of
> months back.. I am quite new to Hbase and I’ve been trying to conduct a
> basic experiment with Hbase..
>
> I am trying to load 200 million records each record around 15 KB : with one
> column value around 14KB and the rest of the 100 column values 8 bytes
> each.. The 120 columns are grouped as 10 qualifiers X 12 families: hope I
> got my jargon right.. Note that only one value is quite large for each doc
> (when compared to other values)...
> The data is uncompressed.. And each value is uniformly randomly selected..
> I used a map-reduce job to load a data file on hdfs into the database.. Soon
> after the job finished, the region servers crash with OOM Exception.. Below
> is part of the trace from the logs in one of the RS’s:
>
> I have attached the conf along with the email: Can you guys point out any
> anamoly in my settings? I have set a heap size of 3 gigs.. Anything
> significantly more, java 32-bit doesn’t run..
>
>
> 2010-05-12 19:22:45,068 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> Total=8.43782MB (8847696), Free=1791.2247MB (1878235312), M
> ax=1799.6626MB (1887083008), Counts: Blocks=1, Access=16947, Hit=52,
> Miss=16895, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.3068389603868127%,
> Miss Ratio=99
> .69316124916077%, Evicted/Run=NaN
> 2010-05-12 19:22:45,069 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> loaded /hbase/DocData/1651418343/col5/7617863559659933969,
> isReference=false, seque
> nce id=2470632548, length=8456716, majorCompaction=false
> 2010-05-12 19:22:45,075 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> loaded /hbase/DocData/1651418343/col6/1328113038200437659,
> isReference=false, seque
> nce id=2960732840, length=19861, majorCompaction=false
> 2010-05-12 19:22:45,078 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> loaded /hbase/DocData/1651418343/col6/6484804359703635950,
> isReference=false, seque
> nce id=2470632548, length=8456716, majorCompaction=false
> 2010-05-12 19:22:45,082 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> loaded /hbase/DocData/1651418343/col7/1673569837212457160,
> isReference=false, seque
> nce id=2960732840, length=19861, majorCompaction=false
> 2010-05-12 19:22:45,085 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> loaded /hbase/DocData/1651418343/col7/4737399093829085995,
> isReference=false, seque
> nce id=2470632548, length=8456716, majorCompaction=false
> 2010-05-12 19:22:47,238 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> loaded /hbase/DocData/1651418343/col8/8446828932792437464,
> isReference=false, seque
> nce id=2960732840, length=19861, majorCompaction=false2010-05-12
> 19:22:47,241 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded
> /hbase/DocData/1651418343/col8/974386128174268353, isReference=false, sequen
> ce id=2470632548, length=8456716, majorCompaction=false
> 2010-05-12 19:22:48,804 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> loaded /hbase/DocData/1651418343/col9/2096232603557969237,
> isReference=false, seque
> nce id=2470632548, length=8456716, majorCompaction=false
> 2010-05-12 19:22:48,807 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> loaded /hbase/DocData/1651418343/col9/7088206045660348092,
> isReference=false, seque
> nce id=2960732840, length=19861, majorCompaction=false
> 2010-05-12 19:22:48,808 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> region DocData,4824176,1273625075099/1651418343 available; sequence id is
> 29607328
> 41
> 2010-05-12 19:22:48,808 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> DocData,40682172,1273607630618
> 2010-05-12 19:22:48,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Opening region DocData,40682172,1273607630618, encoded=271889952
> 2010-05-12 19:22:50,924 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> loaded /hbase/DocData/271889952/CONTENT/4859380626868896307,
> isReference=false, sequence id=2959849236, length=337563,
> majorCompaction=false2010-05-12 19:22:53,037 DEBUG
> org.apache.hadoop.hbase.regionserver.Store: loaded
> /hbase/DocData/271889952/CONTENT/952776139755887312, isReference=false, sequ
> ence id=2082553088, length=110460013, majorCompaction=false
> 2010-05-12 19:22:57,404 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> loaded /hbase/DocData/271889952/col1/66449684560689857, isReference=false,
> sequence
>  id=2959849236, length=12648, majorCompaction=false
> 2010-05-12 19:23:16,165 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening
> DocData,40682172,1273607630618
> java.lang.OutOfMemoryError: Java heap space
>         at java.io.BufferedInputStream.<init>(BufferedInputStream.java:178)
>         at
> org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1369)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1626)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readFully(DataInputStream.java:152)
>         at
> org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1372)
>         at
> org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:848)
>         at
> org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:793)
>         at
> org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:273)
>         at
> org.apache.hadoop.hbase.regionserver.StoreFile.<init>(StoreFile.java:129)
>         at
> org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:410)
>         at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:221)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1549)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:312)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1564)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1531)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1451)
>         at java.lang.Thread.run(Thread.java:619)
> 2010-05-12 19:23:18,246 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
> aborting.
> java.lang.OutOfMemoryError: Java heap space
>         at java.io.BufferedInputStream.<init>(BufferedInputStream.java:178)
>         at
> org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1369)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1626)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readFully(DataInputStream.java:152)
>         at
> org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1372)
>         at
> org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:848)
>         at
> org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:793)
>         at
> org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:273)
>         at
> org.apache.hadoop.hbase.regionserver.StoreFile.<init>(StoreFile.java:129)
>         at
> org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:410)
>         at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:221)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1549)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:312)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1564)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1531)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1451)
>         at java.lang.Thread.run(Thread.java:619)
> 2010-05-12 19:23:18,246 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> request=0.0, regions=942, stores=9411, storefiles=19887,
> storefileIndexSize=182, memstoreSize=0, compactionQueueSize=0,
> usedHeap=2999, maxHeap=2999, blockCacheSize=8847696,
> blockCacheFree=1878235312, blockCacheCount=1, blockCacheHitRatio=0,
> fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
> 2010-05-12 19:23:18,247 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> 2010-05-12 19:23:18,254 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> server on 60020
> 2010-05-12 19:23:18,255 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 0 on 60020: exiting
> 2010-05-12 19:23:18,255 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 1 on 60020: exiting
> 2010-05-12 19:23:18,255 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 3 on 60020: exiting
> 2010-05-12 19:23:18,255 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 2 on 60020: exiting
> And so on (The region server has a total of 100 handlers)..
>
>
>

Mime
View raw message