hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-3856) Asynchronous IO Handling in Hadoop and HDFS
Date Mon, 11 Aug 2008 15:21:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621475#action_12621475
] 

rangadi edited comment on HADOOP-3856 at 8/11/08 8:21 AM:
---------------------------------------------------------------

> I guess would be to cook up a patch replacing read/write in datanode with Async I/O using
Grizzly.
I agree. Before we replacing all of the datanode transfers, we could do one or both of the
following :

* Replace reads. This will help many users including HBase. will be a good a test case without
affecting more critical write path. Bugs in read pipeline don't corrupt HDFS data.

* Replace "responder thread" while writing. I think for this, we might need to find out how
to make Grizzly not close the socket.


      was (Author: rangadi):
    > I guess would be to cook up a patch replacing read/write in datanode with Async I/O
using Grizzly.
I agree. Before we replacing all of the datanode transfers, we could do one or both of the
following :

* Replace reads. This will help many users including HBase. will be a good a test case without
affecting more critical write path. Bugs in read pipeline don't corrupt HDFS data.

* Replace "responder thread" while writing. I think for this, we might need to find out how
not make Grizzly close the socket.

  
> Asynchronous IO Handling in Hadoop and HDFS
> -------------------------------------------
>
>                 Key: HADOOP-3856
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3856
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, io
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: GrizzlyEchoServer.patch, MinaEchoServer.patch
>
>
> I think Hadoop needs utilities or framework to make it simpler to deal with generic asynchronous
IO in  Hadoop.
> Example use case :
> Its been a long standing problem that DataNode takes too many threads for data transfers.
Each write operation takes up 2 threads at each of the datanodes and each read operation takes
one irrespective of how much activity is on the sockets. The kinds of load that HDFS serves
has been expanding quite fast and HDFS should handle these varied loads better. If there is
a framework for non-blocking IO, read and write pipeline state machines could be implemented
with async events on a fixed number of threads. 
> A generic utility is better since it could be used in other places like DFSClient. DFSClient
currently creates 2 extra threads for each file it has open for writing.
> Initially I started writing a primitive "selector", then tried to see if such facility
already exists. [Apache MINA|http://mina.apache.org] seemed to do exactly this. My impression
after looking the the interface and examples is that it does not give kind control we might
prefer or need.  First use case I was thinking of implementing using MINA was to replace "response
handlers" in DataNode. The response handlers are simpler since they don't involve disk I/O.
I [asked on MINA user list|http://www.nabble.com/Async-events-with-existing-NIO-sockets.-td18640767.html],
but looks like it can not be done, I think mainly because the sockets are already created.
> Essentially what I have in mind is similar to MINA, except that read and write of the
sockets is done by the event handlers. The lowest layer essentially invokes selectors, invokes
event handlers on single or on multiple threads. Each event handler is is expected to do some
non-blocking work. We would of course have utility handler implementations that do  read,
write, accept etc, that are useful for simple processing.
> Sam Pullara mentioned that [xSockets|http://xsocket.sourceforge.net/] is more flexible.
It is under GPL.
> Are there other such implementations we should look at?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message