hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: HDFS short-circuit reads
Date Thu, 19 Dec 2013 14:34:41 GMT
Ah, I see - thanks for clarifying.
john

From: Chris Nauroth [mailto:cnauroth@hortonworks.com]
Sent: Tuesday, December 17, 2013 4:32 PM
To: user@hadoop.apache.org
Subject: Re: HDFS short-circuit reads

Both of these methods return the same underlying data type that you're ultimately interested
in.  This is the BlockLocation object, which contains the hosts that have a replica of the
block.  Depending on your usage pattern, one of these methods might be more convenient than
the other.

If your application's input is a single file, then you'll likely find that getFileBlockLocations
is a good fit.  This will give you the BlockLocation information for that one file, and you
won't need to write extra code to pull it out of the RemoteIterator (which you know is only
going to contain one result anyway).

If your application's input is a whole directory, and you then process all files within that
directory, then you'll likely find listLocatedStatus to be more convenient.  You'll be able
to make a single RPC call to get all of the BlockLocation information for all files.  (Like
you said, one call instead of many.)

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Tue, Dec 17, 2013 at 6:39 AM, John Lilley <john.lilley@redpoint.net<mailto:john.lilley@redpoint.net>>
wrote:
Thanks!   I do call FileSytem.getFileBlockLocations() now to map tasks to local data blocks;
is there any advantage to using listLocatedStatus() instead?  I guess one call instead of
two...
John


From: Chris Nauroth [mailto:cnauroth@hortonworks.com<mailto:cnauroth@hortonworks.com>]
Sent: Monday, December 16, 2013 6:07 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: HDFS short-circuit reads

Hello John,

Short-circuit reads are not on by default.  The documentation page you linked to at hadoop.apache.org<http://hadoop.apache.org/>
contains all of the information you need to enable them though.

Regarding checking status of short-circuit read programmatically, here are a few thoughts
on this:

Your application could check Configuration for the dfs.client.read.shortcircuit key.  This
will tell you at a high level if the feature is enabled.  However, note that the feature needs
to be turned on in configuration for both the DataNode and the HDFS client process.  Depending
on the details of the deployment, the DataNode and the client might be using different configuration
files.

This tells you if the feature is enabled, but it doesn't necessarily tell you if you're really
going to get short-circuit reads when you open the file.  There might not be a local replica
for the block, in which case the read would fall back to the typical remote read behavior
anyway.

Depending on what your application wants to achieve, you might also be interested in looking
at the FileSystem.listLocatedStatus API to query information about blocks and the corresponding
locations of replicas.  Applications like MapReduce use this information to try to schedule
their work for optimal locality.  Short-circuit reads then become a further optimization on
top of the gains already achieved by locality.

Hope this helps,

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Mon, Dec 16, 2013 at 4:21 PM, John Lilley <john.lilley@redpoint.net<mailto:john.lilley@redpoint.net>>
wrote:
Our YARN application would benefit from maximal bandwidth on HDFS reads.
But I'm unclear on how short-circuit reads are enabled.
Are they on by default?
Can our application check programmatically to see if the short-circuit read is enabled?
Thanks,
john

RE:
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html
https://issues.apache.org/jira/browse/HDFS-347



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed
and may contain information that is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient, you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or forwarding
of this communication is strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system. Thank You.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed
and may contain information that is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient, you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or forwarding
of this communication is strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system. Thank You.

Mime
View raw message