hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jimmy Xiang <jxi...@cloudera.com>
Subject Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk
Date Wed, 08 May 2013 19:46:30 GMT
I got the same error today and filed HBASE-8511.  The log splitting is
stuck there for a while no matter how many retries.

Thanks,
Jimmy


On Wed, May 8, 2013 at 12:17 PM, Anoop John <anoop.hbase@gmail.com> wrote:

> I see we have FSDataInputStream in ProtobufLogReader.. We will test with
> the patch tomorow and report back.
>
> -Anoop-
>
> On Thu, May 9, 2013 at 12:30 AM, Anoop John <anoop.hbase@gmail.com> wrote:
>
> >  >However this doesn't explain why sequencefile has no such problems
> >
> > What I can see is we use FSDataInputStream in case of
> > SequenecFileLogReader (Not FSInputStream) and we use FSDataInputStream
> > #readInt() (WALEdit#read...).  This inturn reads the int from stream by
> > calling in.read() 4 times making sure we read full int (If it is not EOF
> > issue)
> >
> > May be we can try out this way and test tomorow Ram.
> >
> > BTW IOUtils.readFully() is not returning the actual # bytes read it
> looks.
> >
> > -Anoop-
> >
> >  On Thu, May 9, 2013 at 12:07 AM, Sergey Shelukhin <
> sergey@hortonworks.com
> > > wrote:
> >
> >> 589824 is divisible by 4096, default buffer size. Que groundless
> >> suspicions. However this doesn't explain why sequencefile has no such
> >> problems. Do you have any more numbers with this?
> >> I am in meetings now, let me try readFully after I come back from lunch
> >>
> >> On Wed, May 8, 2013 at 11:23 AM, ramkrishna vasudevan <
> >>  ramkrishna.s.vasudevan@gmail.com> wrote:
> >>
> >> > >The length is from read call. This looks pretty suspicious, if the
> >> stream
> >> > is not EOF why would it return less bytes?
> >> >
> >> > Infact this happened for an entry in the middle of the HLog.  Write
> some
> >> > data using LoadTestTool.  Try to use the HLog tool to read the HLog
> file
> >> > you will end up in this issue. (most of the times).
> >> >
> >> > Regards
> >> > Ram
> >> >
> >> >
> >> >
> >> > On Wed, May 8, 2013 at 11:48 PM, Sergey Shelukhin <
> >> sergey@hortonworks.com
> >> > >wrote:
> >> >
> >> > > if (length != intBytes.length) throw new IOException("Failed read
of
> >> int
> >> > > length " + length);
> >> > > The length is from read call. This looks pretty suspicious, if the
> >> stream
> >> > > is not EOF why would it return less bytes? I will try to repro
> today.
> >> > >
> >> > > On Wed, May 8, 2013 at 5:46 AM, ramkrishna vasudevan <
> >> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > >
> >> > > > On further debugging found that this issue happens with
> >> ProtoBufWriter
> >> > > and
> >> > > > not with sequenceFileWriter.(atleast we could not reproduce it
> with
> >> > > > different runs)
> >> > > >
> >> > > > We can see that the HLog has more data in it but while reading
one
> >> of
> >> > the
> >> > > > lines in the HLog this error happens.  So pretty much sure that
it
> >> is
> >> > not
> >> > > > EOF.
> >> > > > Verified DFS logs but could not find any exceptions out there
too.
> >> > > >
> >> > > > We will try to figure out more on this tomorrow.
> >> > > >
> >> > > > Regards
> >> > > > Ram
> >> > > >
> >> > > >
> >> > > > On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
> >> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > > >
> >> > > > > Ok so i tried this out with hadoop 2.0.4 and also with Sergey's
> >> > patch.
> >> > > > >  The issue is reproducible in all version of hadoop but
not
> >> always.
> >> > > > > I am able to get the errors like this
> >> > > > >
> >> > > > > 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com<
> http://splitlogworker-ram.sh.intel.com/>
> >> > > > ,60020,1367961009182]
> >> > > > > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log
> >> splitting of
> >> > > > .logs/
> >> > > > > ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com
> >> > > > %2C60020%2C1367960957620.1367960993389
> >> > > > > failed, returning error
> >> > > > > java.io.IOException: Error  while reading 1 WAL KVs; started
> >> reading
> >> > at
> >> > > > > 589822 and read up to 589824
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> >> > > > > at java.lang.Thread.run(Thread.java:662)
> >> > > > > Caused by: java.io.IOException: Failed read of int length
2
> >> > > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> >> > > > >  at
> >> > > >
> >> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
> >> > > > > ... 10 more
> >> > > > >
> >> > > > >
> >> > > > > and sometimes
> >> > > > > java.io.IOException: Failed read of int length 1
> >> > > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> >> > > > > at
> >> > >
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
> >> > > > >  at
> >> > > > >
> >> > >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
> >> > > > > at
> >> > > >
> >> >
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> > > > >
> >> > > > >
> >> > > > > I will work on this today and find out the root cause of
it.
> >> > > > >
> >> > > > > Regards
> >> > > > > Ram
> >> > > > >
> >> > > > >
> >> > > > > On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
> >> > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > > > >
> >> > > > >> I too think it may be EOF.  But i did not debug it fully.
>  Today
> >> let
> >> > > me
> >> > > > >> check and try applying your patch.
> >> > > > >>
> >> > > > >> Regards
> >> > > > >> Ram
> >> > > > >>
> >> > > > >>
> >> > > > >> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <
> >> > > > sergey@hortonworks.com>wrote:
> >> > > > >>
> >> > > > >>> Please take a look at the patch in
> >> > > > >>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498
> >...
> >> > > > >>> this should make it possible to get more details.
> >> > > > >>>
> >> > > > >>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin
<
> >> > > > >>> sergey@hortonworks.com>wrote:
> >> > > > >>>
> >> > > > >>> > 1) Is there a cause- stack?
> >> > > > >>> > 2) Can you ascertain if WAL is truncated at
that place?
> >> Exception
> >> > > > type
> >> > > > >>> > might have changed/exception might have expanded
between
> >> Hadoop 1
> >> > > and
> >> > > > >>> 2;
> >> > > > >>> > WAL replay should ignore EOF, so if this is
a EOF problem
> then
> >> > this
> >> > > > >>> would
> >> > > > >>> > be easy to correct, if it's something more
serious then it's
> >> bad.
> >> > > > >>> > I will add some logging/catching around to
add cause (if
> >> missing)
> >> > > and
> >> > > > >>> > useful logs.
> >> > > > >>> >
> >> > > > >>> >
> >> > > > >>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna
vasudevan <
> >> > > > >>> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > > > >>> >
> >> > > > >>> >> Hi All
> >> > > > >>> >>
> >> > > > >>> >> I am getting the following error when i
run Trunk with
> >> > > hadop-2.0.3.
> >> > > > >>> >> java.io.IOException: Failed read of int
length 2
> >> > > > >>> >> at
> >> org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> >> > > > >>> >> at
> >> > > > >>>
> >> > >
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> >> > > > >>> >>
> >> > > > >>> >> Am able to reproduce this with the cluster
but not with the
> >> > > > testcases
> >> > > > >>> even
> >> > > > >>> >> when i run with 2.0.3.
> >> > > > >>> >>
> >> > > > >>> >> Regards
> >> > > > >>> >> Ram
> >> > > > >>> >>
> >> > > > >>> >
> >> > > > >>> >
> >> > > > >>>
> >> > > > >>
> >> > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message