hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Meza <j_meza...@hotmail.com>
Subject Distributed cache: how big is too big?
Date Tue, 09 Apr 2013 04:58:45 GMT
I am researching a Hadoop solution for an existing application that requires a directory structure
full of data for processing.
To make the Hadoop solution work I need to deploy the data directory to each DN when the job
is executed.I know this isn't new and commonly done with a Distributed Cache.
Based on experience what are the common file sizes deployed in a Distributed Cache? I know
smaller is better, but how big is too big? the larger cache deployed I have read there will
be startup latency. I also assume there are other factors that play into this.
I know that->Default local.cache.size=10Gb
-Range of desirable sizes for Distributed Cache= 10Kb - 1Gb??-Distributed Cache is normally
not used if larger than =____?
Another Option: Put the data directories on each DN and provide location to TaskTracker?
View raw message