crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Exception due to same iterator returned back by PGroupedTableType
Date Thu, 28 Jun 2012 08:47:34 GMT
On Thu, Jun 28, 2012 at 9:29 AM, Rahul <rsharma@xebia.com> wrote:
> Yes indeed this is a small PoC to get familiar with Crunch in relation to my
> problem. Basically I have the following algo at play:
> 1. Read data rows
> 2. Create custom keys for each of them, built using various attributes of
> data (this time it is just a simple hash code, but I would like to emit
> multiple key-value pairs)
> 3. Group similar data based on created Keys
> 4. Iterate over individual items in the group and do extensive comparison
> between all of them
>
> I just built an outline in the test case to see what/how can be done, can
> you advise something better ?


Thanks for the outline. In this case, your approach (with putting the
contents of the
incoming Iterable into a collection) should work fine, as long as
number of elements
per group is relatively small (i.e. easily able to fit in the memory
available to each reducer in your Hadoop cluster).

- Gabriel

Mime
View raw message