hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@gmail.com>
Subject Re: where distributed cache start working
Date Fri, 27 Aug 2010 14:04:46 GMT
Hi,
> Thanks Arun. Change the mTime is a good idea. However, given a file (the path is
>
> A/B/C/D/file) distributed to all the nodes, if I just change the mTime of file
> to a earlier time stamp, it will not be replaced next time. Should I also change
> the mTime for all the directories along the path (A, B, C and D). Whose
> timestamp is used by DistributedCache?

It is the timestamp of the file on DFS. So, you modify the file's
timestamp on DFS, it should be re-distributed to all the nodes.

Thanks
Hemanth
>
> Thanks.
> -Gang
>
>
>
>
> ----- 原始邮件 ----
> 发件人: Arun C Murthy <acm@yahoo-inc.com>
> 收件人: mapreduce-user@hadoop.apache.org
> 发送日期: 2010/8/22 (周日) 9:38:02 下午
> 主   题: Re: where distributed cache start working
>
> Moving to mapreduce-user@, bcc common-dev@. Please use the project specific
> lists.
>
> DistributedCache.purgeCache isn't a public api. You shouldn't be calling it from
>
> the task.
>
> A simple way of doing what you want is to change the mtime of the cache files on
>
> HDFS.
>
> Arun
>
> On Aug 22, 2010, at 9:48 AM, Gang Luo wrote:
>
>> Thanks Jeff.
>>
>> However, are you sure TaskRunner.run() is also used in the new API? I use
>>btrace
>> to trace the function call but didn't find this function had been called
>> anywhere.
>>
>>
>> One more question about distributed cache. After I call
>> DistributedCache.purgeCache, I think the local cached files should be deleted
>>or
>> invalidated. However ,When I run the same job with the purge operation at the
>> end multiple times, I find the local files have never been deleted and the
>> modification time is when the first job run. How can I ask my job to
>> re-distributed the cache again anyway?
>>
>> Thanks,
>> -Gang
>>
>>
>>
>>
>> ----- 原始邮件 ----
>> 发件人: Jeff Zhang <zjffdu@gmail.com>
>> 收件人: common-dev@hadoop.apache.org
>> 发送日期: 2010/8/20 (周五) 11:22:49 上午
>> 主   题: Re: where distributed cache start working
>>
>> Hi Gang,
>>
>> In the TaskRunner's run() method, hadoop will download the cache files
>> which you set on the client side to local, then the forked child jvm
>> can use these cache files locally.
>>
>>
>>
>> On Fri, Aug 20, 2010 at 8:08 AM, Gang Luo <lgpublic@yahoo.com.cn> wrote:
>>> Hi all,
>>> I go through the code, but couldn't find the place where distributed cache
>>> start
>>> working. I want to know between DistriubtedCache.addCacheFile at the master
>>> node
>>> and DistributedCache.getLocalCacheFiles at the client side, when and where
> are
>>> the files get distributed.
>>>
>>>
>>> Thanks,
>>> -Gang
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --Best Regards
>>
>> Jeff Zhang
>>
>>
>>
>>
>
>
>
>
>

Mime
View raw message