hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tim robertson <timrobertson...@gmail.com>
Subject Re: Generating many small PNGs to Amazon S3 with MapReduce
Date Thu, 16 Apr 2009 08:14:50 GMT
Thanks Kevin,

"... well, you're doing it wrong." This is what I'm afraid of :o)

I know the TaskTracker for the Maps for example can run on the same
part of the input file but not so sure on the Reduce.  In the reduce,
will the same keys be run on multiple machines in competition?

On Thu, Apr 16, 2009 at 2:21 AM, Kevin Peterson <kpeterson@biz360.com> wrote:
> On Tue, Apr 14, 2009 at 2:35 AM, tim robertson <timrobertson100@gmail.com>wrote:
>> I am considering (for better throughput as maps generate huge request
>> volumes) pregenerating all my tiles (PNG) and storing them in S3 with
>> cloudfront.  There will be billions of PNGs produced each at 1-3KB
>> each.
> Storing billions of PNGs each at 1-3kb each into S3 will be perfectly fine,
> there is no need to generate them and then push them at once, if you are
> storing them each in their own S3 object (which they must be, if you intend
> to fetch them using cloudfront). Each S3 object is unique, and can be
> written fully in parallel. If you are writing to the same S3 object twice,
> ... well, you're doing it wrong.
> However, do the math on the costs for S3. We were doing something similar,
> and found that we were spending a fortune on our put requests at $0.01 per
> 1000, and next to nothing on storage. I've since moved to a more complicated
> model where I pack many small items in each object and store an index in
> simpledb. You'll need to partition your SimpleDBs if you do this.

View raw message