hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Shelukhin <ser...@hortonworks.com>
Subject Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk
Date Wed, 08 May 2013 18:18:07 GMT
if (length != intBytes.length) throw new IOException("Failed read of int
length " + length);
The length is from read call. This looks pretty suspicious, if the stream
is not EOF why would it return less bytes? I will try to repro today.

On Wed, May 8, 2013 at 5:46 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> On further debugging found that this issue happens with ProtoBufWriter and
> not with sequenceFileWriter.(atleast we could not reproduce it with
> different runs)
>
> We can see that the HLog has more data in it but while reading one of the
> lines in the HLog this error happens.  So pretty much sure that it is not
> EOF.
> Verified DFS logs but could not find any exceptions out there too.
>
> We will try to figure out more on this tomorrow.
>
> Regards
> Ram
>
>
> On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > Ok so i tried this out with hadoop 2.0.4 and also with Sergey's patch.
> >  The issue is reproducible in all version of hadoop but not always.
> > I am able to get the errors like this
> >
> > 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com
> ,60020,1367961009182]
> > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
> .logs/
> > ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com
> %2C60020%2C1367960957620.1367960993389
> > failed, returning error
> > java.io.IOException: Error  while reading 1 WAL KVs; started reading at
> > 589822 and read up to 589824
> > at
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > at
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> > at
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> > at
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> > at
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> > at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.io.IOException: Failed read of int length 2
> > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> > at
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> >  at
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
> > at
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
> > ... 10 more
> >
> >
> > and sometimes
> > java.io.IOException: Failed read of int length 1
> > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> >  at
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > at
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > at
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
> >  at
> > org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
> > at
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
> > at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
> >  at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >
> >
> > I will work on this today and find out the root cause of it.
> >
> > Regards
> > Ram
> >
> >
> > On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> >> I too think it may be EOF.  But i did not debug it fully.  Today let me
> >> check and try applying your patch.
> >>
> >> Regards
> >> Ram
> >>
> >>
> >> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <
> sergey@hortonworks.com>wrote:
> >>
> >>> Please take a look at the patch in
> >>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
> >>> this should make it possible to get more details.
> >>>
> >>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <
> >>> sergey@hortonworks.com>wrote:
> >>>
> >>> > 1) Is there a cause- stack?
> >>> > 2) Can you ascertain if WAL is truncated at that place? Exception
> type
> >>> > might have changed/exception might have expanded between Hadoop 1 and
> >>> 2;
> >>> > WAL replay should ignore EOF, so if this is a EOF problem then this
> >>> would
> >>> > be easy to correct, if it's something more serious then it's bad.
> >>> > I will add some logging/catching around to add cause (if missing) and
> >>> > useful logs.
> >>> >
> >>> >
> >>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
> >>> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >>> >
> >>> >> Hi All
> >>> >>
> >>> >> I am getting the following error when i run Trunk with hadop-2.0.3.
> >>> >> java.io.IOException: Failed read of int length 2
> >>> >> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> >>> >> at
> >>> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> >>> >>
> >>> >> Am able to reproduce this with the cluster but not with the
> testcases
> >>> even
> >>> >> when i run with 2.0.3.
> >>> >>
> >>> >> Regards
> >>> >> Ram
> >>> >>
> >>> >
> >>> >
> >>>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message