hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tarandeep Singh" <tarand...@gmail.com>
Subject Re: Caching data selectively on slaves
Date Tue, 11 Nov 2008 21:15:02 GMT
Hi Lohit,

I thought of keeping the data on DFS and reading it from there. But storing
the data on DFS will turn out to be expensive-

1) The data is replicated across cluster.
2) While reading, the Reducer Ri may not have Data Di on the same machine,
so a DFS read will occur.

That was the reason I thought if I could selectively cache the data on
respective machines.
And thanks for the tip that I should try to keep my read time minimum else
reducers might timeout. I will keep this in mind.


On Tue, Nov 11, 2008 at 12:33 PM, lohit <lohit_bv@yahoo.com> wrote:

> DistributedCache would copy the cache data on all nodes. If you know the
> mapping of R* to D*, how about Reduce reading the data from DFS, the D which
> it expects to. Distributed cache will only help if the data you are using is
> used by multiple tasks on same node, in that you would not try to access DFS
> multiple times. If you know that the each 'D' is read by one 'R' then you
> are not buying much with DistributedCache. Although you should also keep in
> mind if you are read takes long time you reducers might timeout failing to
> report status.
> Thanks,
> Lohit
> ----- Original Message ----
> From: Tarandeep Singh <tarandeep@gmail.com>
> To: core-user@hadoop.apache.org
> Sent: Tuesday, November 11, 2008 10:56:41 AM
> Subject: Caching data selectively on slaves
> Hi,
> Is is possible to cache data selectively on slave machines?
> Lets say I have data partitioned as D1, D2... and so on. D1 is required by
> Reducer R1, D2 by R2 and so on. I know this before hand because
> HashPartitioner.getPartition was used to partition the data.
> If I put D1, D2.. in distributed cache, then the data is copied on all
> machines. Is is possible to cache data selectively on machines?
> Thanks,
> Taran

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message