hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-11945) ORC with non-local reads may not be reusing connection to DN
Date Fri, 25 Sep 2015 05:53:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rajesh Balamohan updated HIVE-11945:
------------------------------------
    Attachment: HIVE-11945.3.patch
                HIVE-11945.2.patch

Uploading the patch with zcr changes.
[~prasanth_j] - S3AInputStream should actually have a lazy seek() implementation which can
work well with readFully.  This is followed in PrestoS3FileSystem as well. Will create a separate
bug in HDFS to track this.

> ORC with non-local reads may not be reusing connection to DN
> ------------------------------------------------------------
>
>                 Key: HIVE-11945
>                 URL: https://issues.apache.org/jira/browse/HIVE-11945
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: HIVE-11945.1.patch, HIVE-11945.2.patch, HIVE-11945.3.patch
>
>
> When “seek + readFully(buffer, offset, length)” is used,  DFSInputStream ends up
going via “readWithStrategy()”.  This sets up BlockReader with length equivalent to that
of the block size. So until this position is reached, RemoteBlockReader2.peer would not be
added to the PeerCache (Plz refer RemoteBlockReader2.close() in HDFS).  So eventually the
next call to the same DN would end opening a new socket.  In ORC, when it is not a data local
read, this has a the possibility of opening/closing lots of connections with DN.  
> In random reads, it would be good to set this length to the amount of data that is to
be read (e.g pread call in DFSInputStream which sets up the BlockReader’s length correctly
& the code path returns the Peer back to peer cache properly).  “readFully(position,
buffer, offset, length)” follows this code path and ends up reusing the connections properly.
Creating this JIRA to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message