hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From praveenesh kumar <praveen...@gmail.com>
Subject Re: DistributedCache deprecated
Date Thu, 30 Jan 2014 12:05:34 GMT
Hi Amit,

I am not sure how are they linked with DistributedCache.. Job configuration
is not uploading any data in memory.. As far as I am aware of how
DistributedCache works, nothing get loaded in memory. Distributed cache
just copies the files into slave nodes, so that they are accessible to
mappers/reducers. Usually the location is
${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
distribution to distribution) You always have to read the files in your
mapper or reducer when ever you want to use them.

What has happened is the method of DistributedCache class has now been
added to Job class, and I am assuming they won't change the functionality
of how distributed cache methods used to work, otherwise there would have
been some nice articles on that, plus I don't see any reason of changing
that as well too..  so everything works still the same way.. Its just that
you use the new Job class to use distributed cache features.

I am not sure what entries you are exactly pointing to. Am I missing
anything here ?


On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <amitmittal5@gmail.com> wrote:

> Hi Mike & Prav,
> Although I am new to Hadoop, but would like to add my 2 cents if that
> helps.
> We are having 2 ways for distribution of shared data, one is using Job
> configuration and other is DistributedCache.
> As job configuration is read by the JT, TT and child JVMs, and each time
> the configuration is read, all of its entries are read in memory, even if
> they are not used. So using job configuration is not advised if the data is
> more than few kilobytes. So it is not alternative to DistributedCache
> unless some modifications are done in Job configuration to address this
> limitation.
> So I am also curious to know the alternatative to DistributedCache class.
> Thanks
> Amit
> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
> Michael.Giordano@vistronix.com> wrote:
>>  I noticed that in Hadoop 2.2.0
>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>> Is there a class that provides equivalent functionality? My application
>> relies heavily on DistributedCache.
>> Thanks,
>> Mike G.
>> This communication, along with its attachments, is considered
>> confidential and proprietary to Vistronix.  It is intended only for the use
>> of the person(s) named above.  Note that unauthorized disclosure or
>> distribution of information not generally known to the public is strictly
>> prohibited.  If you are not the intended recipient, please notify the
>> sender immediately.

View raw message