hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk
Date Wed, 08 May 2013 18:23:11 GMT
>The length is from read call. This looks pretty suspicious, if the stream
is not EOF why would it return less bytes?

Infact this happened for an entry in the middle of the HLog.  Write some
data using LoadTestTool.  Try to use the HLog tool to read the HLog file
you will end up in this issue. (most of the times).

Regards
Ram



On Wed, May 8, 2013 at 11:48 PM, Sergey Shelukhin <sergey@hortonworks.com>wrote:

> if (length != intBytes.length) throw new IOException("Failed read of int
> length " + length);
> The length is from read call. This looks pretty suspicious, if the stream
> is not EOF why would it return less bytes? I will try to repro today.
>
> On Wed, May 8, 2013 at 5:46 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > On further debugging found that this issue happens with ProtoBufWriter
> and
> > not with sequenceFileWriter.(atleast we could not reproduce it with
> > different runs)
> >
> > We can see that the HLog has more data in it but while reading one of the
> > lines in the HLog this error happens.  So pretty much sure that it is not
> > EOF.
> > Verified DFS logs but could not find any exceptions out there too.
> >
> > We will try to figure out more on this tomorrow.
> >
> > Regards
> > Ram
> >
> >
> > On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > Ok so i tried this out with hadoop 2.0.4 and also with Sergey's patch.
> > >  The issue is reproducible in all version of hadoop but not always.
> > > I am able to get the errors like this
> > >
> > > 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com
> > ,60020,1367961009182]
> > > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
> > .logs/
> > > ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com
> > %2C60020%2C1367960957620.1367960993389
> > > failed, returning error
> > > java.io.IOException: Error  while reading 1 WAL KVs; started reading at
> > > 589822 and read up to 589824
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> > > at java.lang.Thread.run(Thread.java:662)
> > > Caused by: java.io.IOException: Failed read of int length 2
> > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> > > at
> > >
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > >  at
> > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
> > > ... 10 more
> > >
> > >
> > > and sometimes
> > > java.io.IOException: Failed read of int length 1
> > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > > at
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
> > >  at
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
> > > at
> > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
> > >  at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > >
> > >
> > > I will work on this today and find out the root cause of it.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > >> I too think it may be EOF.  But i did not debug it fully.  Today let
> me
> > >> check and try applying your patch.
> > >>
> > >> Regards
> > >> Ram
> > >>
> > >>
> > >> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <
> > sergey@hortonworks.com>wrote:
> > >>
> > >>> Please take a look at the patch in
> > >>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
> > >>> this should make it possible to get more details.
> > >>>
> > >>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <
> > >>> sergey@hortonworks.com>wrote:
> > >>>
> > >>> > 1) Is there a cause- stack?
> > >>> > 2) Can you ascertain if WAL is truncated at that place? Exception
> > type
> > >>> > might have changed/exception might have expanded between Hadoop
1
> and
> > >>> 2;
> > >>> > WAL replay should ignore EOF, so if this is a EOF problem then
this
> > >>> would
> > >>> > be easy to correct, if it's something more serious then it's bad.
> > >>> > I will add some logging/catching around to add cause (if missing)
> and
> > >>> > useful logs.
> > >>> >
> > >>> >
> > >>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
> > >>> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >>> >
> > >>> >> Hi All
> > >>> >>
> > >>> >> I am getting the following error when i run Trunk with
> hadop-2.0.3.
> > >>> >> java.io.IOException: Failed read of int length 2
> > >>> >> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > >>> >> at
> > >>>
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> > >>> >>
> > >>> >> Am able to reproduce this with the cluster but not with the
> > testcases
> > >>> even
> > >>> >> when i run with 2.0.3.
> > >>> >>
> > >>> >> Regards
> > >>> >> Ram
> > >>> >>
> > >>> >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message