hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Mittal <amitmitt...@gmail.com>
Subject Re: DistributedCache deprecated
Date Thu, 30 Jan 2014 12:27:09 GMT
Hi Prav,

Yes, you are correct that DistributedCache does not upload file into
memory. Also using job configuration and DistributedCache are 2 different
approaches. I am referring based on "Hadoop: The definitive guide"
Chapter:8 > Side Data Distribution (Page 288-295).
As you are saying that now methods of DistributedCache moved to Job, I
request if you please share some article or document on that for my better
understanding, it will be great help.


On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <praveenesh@gmail.com>wrote:

> Hi Amit,
> I am not sure how are they linked with DistributedCache.. Job
> configuration is not uploading any data in memory.. As far as I am aware of
> how DistributedCache works, nothing get loaded in memory. Distributed cache
> just copies the files into slave nodes, so that they are accessible to
> mappers/reducers. Usually the location is
> ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
> distribution to distribution) You always have to read the files in your
> mapper or reducer when ever you want to use them.
> What has happened is the method of DistributedCache class has now been
> added to Job class, and I am assuming they won't change the functionality
> of how distributed cache methods used to work, otherwise there would have
> been some nice articles on that, plus I don't see any reason of changing
> that as well too..  so everything works still the same way.. Its just that
> you use the new Job class to use distributed cache features.
> I am not sure what entries you are exactly pointing to. Am I missing
> anything here ?
> Regards
> Prav
> On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <amitmittal5@gmail.com>wrote:
>> Hi Mike & Prav,
>> Although I am new to Hadoop, but would like to add my 2 cents if that
>> helps.
>> We are having 2 ways for distribution of shared data, one is using Job
>> configuration and other is DistributedCache.
>> As job configuration is read by the JT, TT and child JVMs, and each time
>> the configuration is read, all of its entries are read in memory, even if
>> they are not used. So using job configuration is not advised if the data is
>> more than few kilobytes. So it is not alternative to DistributedCache
>> unless some modifications are done in Job configuration to address this
>> limitation.
>> So I am also curious to know the alternatative to DistributedCache class.
>> Thanks
>> Amit
>> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
>> Michael.Giordano@vistronix.com> wrote:
>>>  I noticed that in Hadoop 2.2.0
>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>> Is there a class that provides equivalent functionality? My application
>>> relies heavily on DistributedCache.
>>> Thanks,
>>> Mike G.
>>> This communication, along with its attachments, is considered
>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>> of the person(s) named above.  Note that unauthorized disclosure or
>>> distribution of information not generally known to the public is strictly
>>> prohibited.  If you are not the intended recipient, please notify the
>>> sender immediately.

View raw message