Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Message-ID: <13898549.20391272578403835.JavaMail.jira@thor>
Date: Thu, 29 Apr 2010 18:00:03 -0400 (EDT)
From: "Eli Collins (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Subject: [jira] Commented: (HDFS-941) Datanode xceiver protocol should allow
 reuse of a connection
In-Reply-To: <1502456391.35441265148139236.JavaMail.jira@brutus.apache.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862439#action_12862439 ] 

Eli Collins commented on HDFS-941:
----------------------------------

Hey bc,

Nice change!  

Do you have any results from a non-random workload? Please collect:
# before/after TestDFSIO runs so we can see if sequential throughput is affected
# hadoop fs -put of a 1g file from n clients in parallel. I suspect this will improve, socket resuse should limit slow start but good to check.

How did you choose DEFAULT_CACHE_SIZE?

In the exception handler in sendReadResult can we be more specific about when it's OK not to be able to send the result, and throw an exception in the cases when it's no OK rather than swallowing all IOExceptions?

In DataXceiver#opReadBlock you throw an IOException in a try block that catches IOException. I think that should LOG.error and close the output stream. You can also chain the following if statements that check stat. 

How about asserting sock != null in putCachedSocket? Seems like this should never happen if the code is correct and it's easy to ignore logs.

File a jira for ERROR_CHECKSUM?

Please add a comment to the head of ReaderSocketCache explaining why we cache BlockReader socket pairs, as opposed to just caching sockets (because we don't multiplex BlockReaders over a single socket between hosts).

Nits:
* Nice comment in the BlockReader header, please define "packet" as well. Is the RPC specification in DataNode outdated? If so fix it or file a jira instead of warning readers it may be outdated. 
* Maybe better name for DN_KEEPALIVE_TIMEOUT since there is no explicit keepalive?  TRANSFER_TIMEOUT?
* Would rename workDone to something specific like opsProcessed or make it a boolean  
* Add an "a" in "with checksum"
* if needs braces eg BlockReader#read

Thanks,
Eli

> Datanode xceiver protocol should allow reuse of a connection
> ------------------------------------------------------------
>
>                 Key: HDFS-941
>                 URL: https://issues.apache.org/jira/browse/HDFS-941
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node, hdfs client
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: bc Wong
>         Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch
>
>
> Right now each connection into the datanode xceiver only processes one operation.
> In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.