hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Xu <...@gopivotal.com>
Subject Re: efficiency of LocalResources and archives
Date Fri, 07 Jun 2013 02:53:08 GMT
Hi John,

If the resources are located in HDFS, and you specify the resource by HDFS
URI, then the answer is yes. The node managers will cache resources, and it
will automatically update the resources by modification time (of HDFS file).

It is recommended to increase the resources' replica number, if the
resources been uploaded from client machine, the replica number is
automatically set to 10 by mapreduce framework.

On Fri, Jun 7, 2013 at 4:10 AM, John Lilley <john.lilley@redpoint.net>wrote:

>  Suppose that I have a large archive in HDFS, say, containing 500 files
> and 4GB.  I want to make this available via YARN LocalResource.  The
> archive doesn’t change very often (maybe once per month).  Will YARN
> optimize for this?  Does the expanded per-node cache persist across
> application runs (using something like modification time to know if
> re-expansion is needed)?****
> ** **
> If the archive is re-expanded on each node every time the app is launched,
> should I set the replication factor higher to reduce rack bandwidth?****
> ** **
> Thanks****
> John****
> ** **

Ted Xu

View raw message