crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Ortiz <>
Subject MRPipeline.cache()
Date Fri, 13 Nov 2015 15:34:41 GMT

     If I have a super expensive to read input data set (think hundreds of
GB of data on s3 for example), would I be able to use cache to make sure I
only do the read once, then hand it out to the jobs that need it, as
opposed to what crunch does by default, which is read it once for each
parallel thread that needs the data?


View raw message