hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14501) NPE in replication with TDE
Date Tue, 13 Oct 2015 03:44:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954319#comment-14954319
] 

Enis Soztutar commented on HBASE-14501:
---------------------------------------

bq. Green lights from that manual test Enis Soztutar?
Yes, just got around to test this with the TDE cluster. NPE's are gone which were quite reproducible.
I'll commit the patch shortly. The test failures are not related. 

> NPE in replication with TDE
> ---------------------------
>
>                 Key: HBASE-14501
>                 URL: https://issues.apache.org/jira/browse/HBASE-14501
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
>         Attachments: hbase-14501_v1.patch
>
>
> We are seeing a NPE when replication (or in this case async wal replay for region replicas)
is run on top of an HDFS cluster with TDE configured.
> This is the stack trace:
> {code}
> java.lang.NullPointerException
>         at org.apache.hadoop.hbase.CellUtil.matchingRow(CellUtil.java:370)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.countDistinctRowKeys(ReplicationSource.java:649)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:450)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:346)
> {code}
> This stack trace can only happen if WALEdit.getCells() returns an array containing null
entries. I believe this happens due to {{KeyValueCodec.parseCell()}} uses {{KeyValueUtil.iscreate()}}
which returns null in case of EOF at the beginning. However, the contract for the Decoder.parseCell()
is not clear whether returning null is acceptable or not. The other Decoders (CompressedKvDecoder,
CellCodec, etc) do not return null while KeyValueCodec does. 
> BaseDecoder has this code: 
> {code}
>   public boolean advance() throws IOException {
>     if (!this.hasNext) return this.hasNext;
>     if (this.in.available() == 0) {
>       this.hasNext = false;
>       return this.hasNext;
>     }
>     try {
>       this.current = parseCell();
>     } catch (IOException ioEx) {
>       rethrowEofException(ioEx);
>     }
>     return this.hasNext;
>   }
> {code}
> which is not correct since it uses {{IS.available()}} not according to the javadoc: (https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#available()).
DFSInputStream implements {{available()}} as the remaining bytes to read from the stream,
so we do not see the issue there. {{CryptoInputStream.available()}} does a similar thing but
see the issue. 
> So two questions: 
>  - What should be the interface for Decoder.parseCell()? Can it return null? 
>  - How to properly fix  BaseDecoder.advance() to not rely on {{available()}} call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message