hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk
Date Wed, 08 May 2013 12:46:58 GMT
On further debugging found that this issue happens with ProtoBufWriter and
not with sequenceFileWriter.(atleast we could not reproduce it with
different runs)

We can see that the HLog has more data in it but while reading one of the
lines in the HLog this error happens.  So pretty much sure that it is not
EOF.
Verified DFS logs but could not find any exceptions out there too.

We will try to figure out more on this tomorrow.

Regards
Ram


On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Ok so i tried this out with hadoop 2.0.4 and also with Sergey's patch.
>  The issue is reproducible in all version of hadoop but not always.
> I am able to get the errors like this
>
> 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com,60020,1367961009182]
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of .logs/
> ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com%2C60020%2C1367960957620.1367960993389
> failed, returning error
> java.io.IOException: Error  while reading 1 WAL KVs; started reading at
> 589822 and read up to 589824
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
>  at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>  at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
>  at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
>  at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
>  at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Failed read of int length 2
> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> at
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>  at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
> at
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>  at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
> ... 10 more
>
>
> and sometimes
> java.io.IOException: Failed read of int length 1
> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
>  at
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
>  at
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
>  at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
> at
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
>  at
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>
> I will work on this today and find out the root cause of it.
>
> Regards
> Ram
>
>
> On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
>> I too think it may be EOF.  But i did not debug it fully.  Today let me
>> check and try applying your patch.
>>
>> Regards
>> Ram
>>
>>
>> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <sergey@hortonworks.com>wrote:
>>
>>> Please take a look at the patch in
>>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
>>> this should make it possible to get more details.
>>>
>>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <
>>> sergey@hortonworks.com>wrote:
>>>
>>> > 1) Is there a cause- stack?
>>> > 2) Can you ascertain if WAL is truncated at that place? Exception type
>>> > might have changed/exception might have expanded between Hadoop 1 and
>>> 2;
>>> > WAL replay should ignore EOF, so if this is a EOF problem then this
>>> would
>>> > be easy to correct, if it's something more serious then it's bad.
>>> > I will add some logging/catching around to add cause (if missing) and
>>> > useful logs.
>>> >
>>> >
>>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
>>> > ramkrishna.s.vasudevan@gmail.com> wrote:
>>> >
>>> >> Hi All
>>> >>
>>> >> I am getting the following error when i run Trunk with hadop-2.0.3.
>>> >> java.io.IOException: Failed read of int length 2
>>> >> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>>> >> at
>>> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
>>> >>
>>> >> Am able to reproduce this with the cluster but not with the testcases
>>> even
>>> >> when i run with 2.0.3.
>>> >>
>>> >> Regards
>>> >> Ram
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message