Edward J. Yoon wrote:
> Yes, but then, as the i grows, the task to workload ratio gets larger
> and larger. Is It Right?
>
I hope you have seen the corrected version in the latest email. What do
you mean by *i*? If you mean the index in the key ordering then smaller
the index, larger the keys associated with it and that is what we want.
If you mean the total number of keys then yes, larger the number of keys
more the combinations/associations the smaller key has to make. Since
there will be mC2 combinations (m : num keys), one can optimize it to
have mC2 / N values per reducer (N : numreducers). Something like
partition(index i, key key_j, int N) { // N is num reducers
// find the data per reducer
int dataPerRed = mC2 / N; // assuming m is known
int prev_sum = 0;
// calculate the total combinations contributed by previous indexes
for (k=1; k < i; k++) {
prev_sum += m  k + 1; // this adds the number of combinations
contributed by kth index
}
prev_sum += j  i + 1 // self contribution
return prev_sum % dataPerRed
}
I think this might work.
Amar
> Edward
>
> On Wed, Aug 13, 2008 at 9:23 PM, Amar Kamat <amarrk@yahooinc.com> wrote:
>
>> Edward J. Yoon wrote:
>>
>>> Hi communities,
>>>
>>> Do you have any idea how to get the pairs of all row key combinations
>>> w/o repetition on Map/Reduce as describe below?
>>>
>>> Input : (MapFile or Hbase Table)
>>>
>>> <Key1, Value or RowResult>
>>> <Key2, Value or RowResult>
>>> <Key3, Value or RowResult>
>>> <Key4, Value or RowResult>
>>>
>>> Output :
>>>
>>> <Key1, Key2>
>>> <Key1, Key3>
>>> <Key1, Key4>
>>> <Key2, Key3>
>>> <Key2, Key4>
>>> <Key3, Key4>
>>>
>>>
>> One way to do it would be as follows
>> For every key with index i,
>> for (k=0; k < i; k++) {
>> emit(i,key_i)
>> }
>> So the above input becomes
>> 1,key1
>> 1,key1
>> 1,key1
>>
>>> It would be nice if someone can review my pseudo code of traditional
>>> CF using cosine similarity.
>>> http://wiki.apache.org/hama/TraditionalCollaborativeFiltering
>>>
>>> Thanks.
>>>
>>>
>>
>
>
>
>
