hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tenghuan He <tenghua...@gmail.com>
Subject Re: Directly reading from datanode using JAVA API got socketTimeoutException
Date Fri, 01 Jan 2016 01:17:51 GMT
The following is what I want to do.
When reading a big file across multi blocks, I want to read different
blocks from different node in parallel thus make reading big file faster.
Is that possible?

Thanks

On Thu, Dec 31, 2015 at 2:34 AM, Chris Nauroth <cnauroth@hortonworks.com>
wrote:

> Your code has connected to a DataNode's TCP port, and the DataNode server
> side is likely blocked expecting the client to send some kind of request
> defined in the Data Transfer Protocol.  The client code here does not write
> a request, so the DataNode server doesn't know what to do.  Instead, the
> client immediately goes into a blocking read.  Since the DataNode server
> side doesn't know what to do, it's never going to write any bytes back to
> the socket connection, and therefore the client eventually times out on the
> read.
>
> Stepping back, please be aware that what you are trying to do is
> unsupported.  Relying on private implementation details like this is likely
> to be brittle and buggy.  As the HDFS code evolves in the future, there is
> no guarantee that what you do here will work the same way in future
> versions.  There might not even be a connectToDN method in future versions
> if we decide to do some internal refactoring.
>
> If you can give a high-level description of what you want to achieve, then
> perhaps we can suggest a way to do it through the public API.
>
> --Chris Nauroth
>
> From: Tenghuan He <tenghuanhe@gmail.com>
> Date: Wednesday, December 30, 2015 at 9:29 AM
> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Subject: Directly reading from datanode using JAVA API got
> socketTimeoutException
>
> ​Hello,
>
> I want to directly read from datanode blocks using JAVA API as the
> following code, but I got socketTimeoutException
>
> I use reflection to call the DFSClient private method connectToDN(...),
> and get IOStreamPair of in and out, where in is used to read bytes from
> datanode.
> The workhorse code is
>
> try {
>     Method connectToDN;
>     Class[] paraList = {DatanodeInfo.class, int.class, LocatedBlock.class};
>     connectToDN = dfsClient.getClass().getDeclaredMethod("connectToDN", paraList);
>     connectToDN.setAccessible(true);
>     IOStreamPair pair = (IOStreamPair) connectToDN.invoke(dfsClient, datanode, timeout,
lb);
>     in = new DataInputStream(pair.in);
>     System.out.println(in.getClass());
>     byte[] b = new byte[10000];
>     in.readFully(b);
> } catch (Exception e) {
>     e.printStackTrace();
>
> }
>
> and the exception is
>
> java.net.SocketTimeoutException: 11000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.179.1:53765
> remote=/192.168.179.135:50010]
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at java.io.DataInputStream.readFully(DataInputStream.java:169)
> at BlocksList.main(BlocksList.java:69)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)​
>
> Could anyone tell me where the problem is?
>
> Thanks & Begards
>
> Tenghuan He
>

Mime
View raw message