hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yi Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8797) WebHdfsFileSystem creates too many connections for pread
Date Wed, 22 Jul 2015 08:14:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636479#comment-14636479
] 

Yi Liu commented on HDFS-8797:
------------------------------

{quote}
readFully call read repeated so that it is a problem. read itself seems fine.
Yeah, looks like the main issue is with readFully here. So currently I keep the original read
unchanged.
{quote}
Sorry, I was dizzy here, {{readFully}} is the main issue, but is it a bit more efficient if
we use the same way for normal pread? 
# The new approach hear is  to open a separate new connection for pread, then close it after
finish. When client does stateful read again, original connection is not affected.
# {{seek}} + {{read}} + {{seek}}: will close the original connection hold by stateful read,
and open a new connection for pread.  But when client does stateful read again, connection
should be close and open again.

So #2 ({{seek}} + {{read}} + {{seek}}) requires additional one more close/open connection
for normal pread?

> WebHdfsFileSystem creates too many connections for pread
> --------------------------------------------------------
>
>                 Key: HDFS-8797
>                 URL: https://issues.apache.org/jira/browse/HDFS-8797
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-8797.000.patch, HDFS-8797.001.patch, HDFS-8797.002.patch
>
>
> While running a test we found that WebHdfsFileSystem can create several thousand connections
when doing a position read of a 200MB file. For each connection the client will connect to
the DataNode again and the DataNode will create a new DFSClient instance to handle the read
request. This also leads to several thousand {{getBlockLocations}} call to the NameNode.
> The cause of the issue is that in {{FSInputStream#read(long, byte[], int, int)}}, each
time the inputstream reads some time, it seeks back to the old position and resets its state
to SEEK. Thus the next read will regenerate the connection.
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
>     throws IOException {
>     synchronized (this) {
>       long oldPos = getPos();
>       int nread = -1;
>       try {
>         seek(position);
>         nread = read(buffer, offset, length);
>       } finally {
>         seek(oldPos);
>       }
>       return nread;
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message