hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-16766) Do not rely on InputStream.available()
Date Tue, 04 Oct 2016 20:33:20 GMT

     [ https://issues.apache.org/jira/browse/HBASE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Enis Soztutar updated HBASE-16766:
    Status: Patch Available  (was: Open)

> Do not rely on InputStream.available() 
> ---------------------------------------
>                 Key: HBASE-16766
>                 URL: https://issues.apache.org/jira/browse/HBASE-16766
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.4.0
>         Attachments: hbase-16766_v1.patch
> ProtobufLogReader relies on InputStream.available() to figure out whether we have exhausted
the file. However InputStream.available() javadoc states: 
> {code}
>      * <p> Note that while some implementations of {@code InputStream} will return
>      * the total number of bytes in the stream, many will not.  It is
>      * never correct to use the return value of this method to allocate
>      * a buffer intended to hold all data in this stream.
> {code}
> HDFS and many other Hadoop FS's, and things like ByteBufferInputStream, etc all return
remaining bytes, so the code works on top of HDFS. However, on other file systems, it may
or may not be true that IS.available() returns the remaining bytes. In one specific case,
the ADLS wrapper FS used implement {{available()}} call with the correct semantics, which
ended up causing data loss in the WAL recovery. We have since fixed ADLS to implement the
HDFS semantics, but we should fix HBase itself so that we do not rely on available() call.

This message was sent by Atlassian JIRA

View raw message