hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-11945) ORC with non-local reads may not be reusing connection to DN
Date Thu, 24 Sep 2015 10:16:04 GMT
Rajesh Balamohan created HIVE-11945:
---------------------------------------

             Summary: ORC with non-local reads may not be reusing connection to DN
                 Key: HIVE-11945
                 URL: https://issues.apache.org/jira/browse/HIVE-11945
             Project: Hive
          Issue Type: Bug
            Reporter: Rajesh Balamohan
            Assignee: Rajesh Balamohan


When “seek + readFully(buffer, offset, length)” is used,  DFSInputStream ends up going
via “readWithStrategy()”.  This sets up BlockReader with length equivalent to that of
the block size. So until this position is reached, RemoteBlockReader2.peer would not be added
to the PeerCache (Plz refer RemoteBlockReader2.close() in HDFS).  So eventually the next call
to the same DN would end opening a new socket.  In ORC, when it is not a data local read,
this has a the possibility of opening/closing lots of connections with DN.  

In random reads, it would be good to set this length to the amount f data that is to be read
(e.g pread call in DFSInputStream which sets up the BlockReader’s length correctly &
the code path returns the Peer back to peer cache properly).  “readFully(position, buffer,
offset, length)” follows this code path and ends up reusing the connections properly. Creating
this JIRA to fix this issue.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message