hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-9373) [replication] data loss because replication doesn't expect partial reads
Date Fri, 30 Aug 2013 00:53:52 GMT

     [ https://issues.apache.org/jira/browse/HBASE-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jean-Daniel Cryans updated HBASE-9373:
--------------------------------------

    Description: 
When I see this in the logs it often means we got a partial read and then we have the wrong
offset when reading the rest of the file

{noformat}
2013-08-28 23:16:07,182 ERROR [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617]
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while reading WAL,
probably an unexpected EOF, ignoring
com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire
type.
        at com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
        at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
        at com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
        at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:686)
        at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:644)
        at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771)
        at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766)
        at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444)
        at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218)
        at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220)
        at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)
        at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
        at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290)
        at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926)
        at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296)
        at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918)
        at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197)
        at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:390)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:298)
{noformat}

  was:
Two things that are bugging me.

First this one where we try to be more responsive now and only sleep 1 second if we didn't
get data. Let's set it down to TRACE.

bq. 2013-08-28 23:17:47,421 DEBUG [regionserver60020.replicationSource,1] org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Nothing to replicate, sleeping 1000 times 1

Then I've seen cases where we can hit an EOF and instead of just being silent we hit this:

{noformat}
2013-08-28 23:16:07,182 ERROR [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617]
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while reading WAL,
probably an unexpected EOF, ignoring
com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire
type.
        at com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
        at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
        at com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
        at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:686)
        at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:644)
        at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771)
        at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766)
        at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444)
        at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218)
        at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220)
        at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)
        at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
        at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290)
        at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926)
        at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296)
        at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918)
        at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197)
        at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:390)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:298)
{noformat}

The problem here is it shows up as an ERROR, so the intention is that there really could be
a problem? Or would it manifest itself in some other way anyway if we silence this exception?
[~stack]? FWIW I verified that I had all my data.

       Priority: Blocker  (was: Major)
        Summary: [replication] data loss because replication doesn't expect partial reads
 (was: Fix more log spam in replication for 0.96.0)
    
> [replication] data loss because replication doesn't expect partial reads
> ------------------------------------------------------------------------
>
>                 Key: HBASE-9373
>                 URL: https://issues.apache.org/jira/browse/HBASE-9373
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.95.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.98.0, 0.96.0
>
>         Attachments: 9373.txt
>
>
> When I see this in the logs it often means we got a partial read and then we have the
wrong offset when reading the rest of the file
> {noformat}
> 2013-08-28 23:16:07,182 ERROR [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617]
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while reading WAL,
probably an unexpected EOF, ignoring
> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid
wire type.
>         at com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
>         at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
>         at com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
>         at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:686)
>         at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:644)
>         at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771)
>         at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766)
>         at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444)
>         at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218)
>         at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220)
>         at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)
>         at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
>         at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290)
>         at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926)
>         at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296)
>         at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918)
>         at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197)
>         at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:390)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:298)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message