hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lohit <lohit...@yahoo.com>
Subject Re: Caching data selectively on slaves
Date Tue, 11 Nov 2008 20:33:35 GMT
DistributedCache would copy the cache data on all nodes. If you know the mapping of R* to D*,
how about Reduce reading the data from DFS, the D which it expects to. Distributed cache will
only help if the data you are using is used by multiple tasks on same node, in that you would
not try to access DFS multiple times. If you know that the each 'D' is read by one 'R' then
you are not buying much with DistributedCache. Although you should also keep in mind if you
are read takes long time you reducers might timeout failing to report status.


----- Original Message ----
From: Tarandeep Singh <tarandeep@gmail.com>
To: core-user@hadoop.apache.org
Sent: Tuesday, November 11, 2008 10:56:41 AM
Subject: Caching data selectively on slaves


Is is possible to cache data selectively on slave machines?

Lets say I have data partitioned as D1, D2... and so on. D1 is required by
Reducer R1, D2 by R2 and so on. I know this before hand because
HashPartitioner.getPartition was used to partition the data.

If I put D1, D2.. in distributed cache, then the data is copied on all
machines. Is is possible to cache data selectively on machines?


View raw message