hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk
Date Wed, 08 May 2013 18:36:58 GMT
Ram,
  For you the issue was not happening with lesser data right?

-Anoop-

On Thu, May 9, 2013 at 12:04 AM, Anoop John <anoop.hbase@gmail.com> wrote:

> The issue was not coming for the same test with same ammount of data with
> SequenceFileLogWritter & SFLR
>
> Can the FSInputStream#read(byte[]) can read lesser bytes only even when it
> is not EOF?  I can see we use   IOUtils.*readFully*(in, byte[], int, int);
> What is the difference of this from the other?  Do there be a diff when we
> read at file blocks boundary? (For these 2)
>
> -Anoop-
>
>  On Wed, May 8, 2013 at 11:53 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
>> >The length is from read call. This looks pretty suspicious, if the stream
>> is not EOF why would it return less bytes?
>>
>> Infact this happened for an entry in the middle of the HLog.  Write some
>> data using LoadTestTool.  Try to use the HLog tool to read the HLog file
>> you will end up in this issue. (most of the times).
>>
>> Regards
>> Ram
>>
>>
>>
>> On Wed, May 8, 2013 at 11:48 PM, Sergey Shelukhin <sergey@hortonworks.com
>> >wrote:
>>
>> > if (length != intBytes.length) throw new IOException("Failed read of int
>> > length " + length);
>> > The length is from read call. This looks pretty suspicious, if the
>> stream
>> > is not EOF why would it return less bytes? I will try to repro today.
>> >
>> > On Wed, May 8, 2013 at 5:46 AM, ramkrishna vasudevan <
>> > ramkrishna.s.vasudevan@gmail.com> wrote:
>> >
>> > > On further debugging found that this issue happens with ProtoBufWriter
>> > and
>> > > not with sequenceFileWriter.(atleast we could not reproduce it with
>> > > different runs)
>> > >
>> > > We can see that the HLog has more data in it but while reading one of
>> the
>> > > lines in the HLog this error happens.  So pretty much sure that it is
>> not
>> > > EOF.
>> > > Verified DFS logs but could not find any exceptions out there too.
>> > >
>> > > We will try to figure out more on this tomorrow.
>> > >
>> > > Regards
>> > > Ram
>> > >
>> > >
>> > > On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
>> > > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > >
>> > > > Ok so i tried this out with hadoop 2.0.4 and also with Sergey's
>> patch.
>> > > >  The issue is reproducible in all version of hadoop but not always.
>> > > > I am able to get the errors like this
>> > > >
>> > > > 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com<http://splitlogworker-ram.sh.intel.com/>
>> > > ,60020,1367961009182]
>> > > > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting
>> of
>> > > .logs/
>> > > > ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com
>> > > %2C60020%2C1367960957620.1367960993389
>> > > > failed, returning error
>> > > > java.io.IOException: Error  while reading 1 WAL KVs; started
>> reading at
>> > > > 589822 and read up to 589824
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
>> > > > at java.lang.Thread.run(Thread.java:662)
>> > > > Caused by: java.io.IOException: Failed read of int length 2
>> > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>> > > >  at
>> > > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
>> > > > ... 10 more
>> > > >
>> > > >
>> > > > and sometimes
>> > > > java.io.IOException: Failed read of int length 1
>> > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>> > > > at
>> > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
>> > > >  at
>> > > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
>> > > > at
>> > >
>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
>> > > >  at
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> > > > at
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> > > >
>> > > >
>> > > > I will work on this today and find out the root cause of it.
>> > > >
>> > > > Regards
>> > > > Ram
>> > > >
>> > > >
>> > > > On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
>> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > > >
>> > > >> I too think it may be EOF.  But i did not debug it fully.  Today
>> let
>> > me
>> > > >> check and try applying your patch.
>> > > >>
>> > > >> Regards
>> > > >> Ram
>> > > >>
>> > > >>
>> > > >> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <
>> > > sergey@hortonworks.com>wrote:
>> > > >>
>> > > >>> Please take a look at the patch in
>> > > >>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
>> > > >>> this should make it possible to get more details.
>> > > >>>
>> > > >>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <
>> > > >>> sergey@hortonworks.com>wrote:
>> > > >>>
>> > > >>> > 1) Is there a cause- stack?
>> > > >>> > 2) Can you ascertain if WAL is truncated at that place?
>> Exception
>> > > type
>> > > >>> > might have changed/exception might have expanded between
Hadoop
>> 1
>> > and
>> > > >>> 2;
>> > > >>> > WAL replay should ignore EOF, so if this is a EOF problem
then
>> this
>> > > >>> would
>> > > >>> > be easy to correct, if it's something more serious then
it's
>> bad.
>> > > >>> > I will add some logging/catching around to add cause
(if
>> missing)
>> > and
>> > > >>> > useful logs.
>> > > >>> >
>> > > >>> >
>> > > >>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan
<
>> > > >>> > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > > >>> >
>> > > >>> >> Hi All
>> > > >>> >>
>> > > >>> >> I am getting the following error when i run Trunk
with
>> > hadop-2.0.3.
>> > > >>> >> java.io.IOException: Failed read of int length 2
>> > > >>> >> at
>> org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>> > > >>> >> at
>> > > >>>
>> > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
>> > > >>> >>
>> > > >>> >> Am able to reproduce this with the cluster but not
with the
>> > > testcases
>> > > >>> even
>> > > >>> >> when i run with 2.0.3.
>> > > >>> >>
>> > > >>> >> Regards
>> > > >>> >> Ram
>> > > >>> >>
>> > > >>> >
>> > > >>> >
>> > > >>>
>> > > >>
>> > > >>
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message