hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiran Kumar M R, Huawei (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11620) Propagate decoder exception to HLogSplitter so that loss of data is avoided
Date Thu, 31 Jul 2014 16:10:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081043#comment-14081043
] 

Kiran Kumar M R, Huawei commented on HBASE-11620:
-------------------------------------------------

I tested patch submitted by Ted Yu, its not working. Even though ioException is through instead
of EOF, it is still not considered as corrupt.

Here are the logs. Refer line with *Throwing ioEx instead of eofEx*

{code}
2014-07-31 21:19:11,923 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-0] wal.HLogSplitter: Splitting
hlog: hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406821527620-splitting/HOST-16%2C15264%2C1406821527620.1406821561362,
length=174
2014-07-31 21:19:11,923 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-0] wal.HLogSplitter: DistributedLogReplay
= false
2014-07-31 21:19:11,994 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-0] util.FSHDFSUtils: Recovering
lease on dfs file hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406821527620-splitting/HOST-16%2C15264%2C1406821527620.1406821561362
2014-07-31 21:19:11,996 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-0] util.FSHDFSUtils: recoverLease=true,
attempt=0 on file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406821527620-splitting/HOST-16%2C15264%2C1406821527620.1406821561362
after 2ms
2014-07-31 21:19:12,009 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-0-Writer-0] wal.HLogSplitter:
Writer thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-0-Writer-0,5,main]: starting
2014-07-31 21:19:12,009 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-0-Writer-2] wal.HLogSplitter:
Writer thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-0-Writer-2,5,main]: starting
2014-07-31 21:19:12,009 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-0-Writer-1] wal.HLogSplitter:
Writer thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-0-Writer-1,5,main]: starting
2014-07-31 21:19:12,170 ERROR [RS_LOG_REPLAY_OPS-HOST-16:15264-0] codec.BaseDecoder: Partial
cell read caused by EOF - Throwing ioEx instead of eofEx : java.io.IOException: Premature
EOF from inputStream
2014-07-31 21:19:12,170 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-0] wal.HLogSplitter: Finishing
writing output logs and closing down.
2014-07-31 21:19:12,170 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-0] wal.HLogSplitter: Waiting
for split writer threads to finish
2014-07-31 21:19:12,170 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-0] wal.HLogSplitter: Split
writers finished
2014-07-31 21:19:12,171 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-0] wal.HLogSplitter: Processed
0 edits across 0 regions; log file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406821527620-splitting/HOST-16%2C15264%2C1406821527620.1406821561362
is corrupted = false progress failed = false
2014-07-31 21:19:12,202 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-0] handler.HLogSplitterHandler:
successfully transitioned task /hbase/splitWAL/WALs%2FHOST-10-18-40-16%2C15264%2C1406821527620-splitting%2FHOST-10-18-40-16%252C15264%252C1406821527620.1406821561362
to final state DONE HOST-16,15264,1406821739918
2014-07-31 21:19:12,202 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-0] handler.HLogSplitterHandler:
worker HOST-16,15264,1406821739918 done with task /hbase/splitWAL/WALs%2FHOST-10-18-40-16%2C15264%2C1406821527620-splitting%2FHOST-10-18-40-16%252C15264%252C1406821527620.1406821561362
in 316ms
{code}

> Propagate decoder exception to HLogSplitter so that loss of data is avoided
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-11620
>                 URL: https://issues.apache.org/jira/browse/HBASE-11620
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.4
>            Reporter: Ted Yu
>            Priority: Critical
>         Attachments: 11620-v1.txt
>
>
> Reported by Kiran in this thread: "HBase file encryption, inconsistencies observed and
data loss"
> After step 4 ( i.e disabling of WAL encryption, removing SecureProtobufReader/Writer
and restart), read of encrypted WAL fails mainly due to EOF exception at Basedecoder. This
is not considered as error and these WAL are being moved to /oldWALs.
> Following is observed in log files:
> {code}
> 2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] wal.HLogSplitter: Splitting
hlog: hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017,
length=172
> 2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] wal.HLogSplitter: DistributedLogReplay
= false
> 2014-07-30 19:44:29,313 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] util.FSHDFSUtils: Recovering
lease on dfs file hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
> 2014-07-30 19:44:29,315 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] util.FSHDFSUtils: recoverLease=true,
attempt=0 on file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
after 1ms
> 2014-07-30 19:44:29,429 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0] wal.HLogSplitter:
Writer thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0,5,main]: starting
> 2014-07-30 19:44:29,429 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1] wal.HLogSplitter:
Writer thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1,5,main]: starting
> 2014-07-30 19:44:29,430 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2] wal.HLogSplitter:
Writer thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2,5,main]: starting
> 2014-07-30 19:44:29,591 ERROR [RS_LOG_REPLAY_OPS-HOST-16:15264-1] codec.BaseDecoder:
Partial cell read caused by EOF: java.io.IOException: Premature EOF from inputStream
> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] wal.HLogSplitter: Finishing
writing output logs and closing down.
> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] wal.HLogSplitter: Waiting
for split writer threads to finish
> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] wal.HLogSplitter: Split
writers finished
> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] wal.HLogSplitter: Processed
0 edits across 0 regions; log file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
is corrupted = false progress failed = false
> {code}
> To fix this, we need to propagate EOF exception to HLogSplitter. Any suggestions on the
fix?
> -------- (end of quote from Kiran)
> In BaseDecoder#rethrowEofException() :
> {code}
>     if (!isEof) throw ioEx;
>     LOG.error("Partial cell read caused by EOF: " + ioEx);
>     EOFException eofEx = new EOFException("Partial cell read");
>     eofEx.initCause(ioEx);
>     throw eofEx;
> {code}
> throwing EOFException would not propagate the "Partial cell read" condition to HLogSplitter
which doesn't treat EOFException as an error.
> I think IOException should be thrown above - HLogSplitter#getNextLogLine() would translate
the IOEx to CorruptedLogFileException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message