hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elton sky <eltonsky9...@gmail.com>
Subject Re: Applications creates bigger output than input?
Date Thu, 19 May 2011 08:06:50 GMT
Hello,
I pick up this topic again, because what I am looking for is something not
CPU bound. Augmenting data for ETL and generating index are good examples.
Neither of them requires too much cpu time on map side. The main bottle neck
for them is shuffle and merge.

Market basket analysis is cpu intensive in map phase, for sampling all
possible combinations of items.

I am still looking for more applications, which creates bigger output and
not CPU bound.
Any further idea? I appreciate.


On Tue, May 3, 2011 at 3:10 AM, Steve Loughran <stevel@apache.org> wrote:

> On 30/04/2011 05:31, elton sky wrote:
>
>> Thank you for suggestions:
>>
>> Weblog analysis, market basket analysis and generating search index.
>>
>> I guess for these applications we need more reduces than maps, for
>> handling
>> large intermediate output, isn't it. Besides, the input split for map
>> should
>> be smaller than usual,  because the workload for spill and merge on map's
>> local disk is heavy.
>>
>
> any form of rendering can generate very large images
>
> see: http://www.hpl.hp.com/techreports/2009/HPL-2009-345.pdf
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message