 Forwarded message 
From: Edward J. Yoon <edwardyoon@apache.org>
Date: Wed, Aug 13, 2008 at 11:34 PM
Subject: Re: Get the pairs of all row key combinations w/o repetition
To: coreuser@hadoop.apache.org
Hmmm, Yes!! I'll try as above, It looks good.
Thanks, Ed
On Wed, Aug 13, 2008 at 10:52 PM, Amar Kamat <amarrk@yahooinc.com> wrote:
> Edward J. Yoon wrote:
>>
>> Yes, but then, as the i grows, the task to workload ratio gets larger
>> and larger. Is It Right?
>>
>
> I hope you have seen the corrected version in the latest email. What do you
> mean by *i*? If you mean the index in the key ordering then smaller the
> index, larger the keys associated with it and that is what we want. If you
> mean the total number of keys then yes, larger the number of keys more the
> combinations/associations the smaller key has to make. Since there will be
> mC2 combinations (m : num keys), one can optimize it to have mC2 / N values
> per reducer (N : numreducers). Something like
>
> partition(index i, key key_j, int N) { // N is num reducers
> // find the data per reducer
> int dataPerRed = mC2 / N; // assuming m is known
> int prev_sum = 0;
> // calculate the total combinations contributed by previous indexes
> for (k=1; k < i; k++) {
> prev_sum += m  k + 1; // this adds the number of combinations contributed
> by kth index
> }
> prev_sum += j  i + 1 // self contribution
> return prev_sum % dataPerRed
> }
> I think this might work.
> Amar
>>
>> Edward
>>
>> On Wed, Aug 13, 2008 at 9:23 PM, Amar Kamat <amarrk@yahooinc.com> wrote:
>>
>>>
>>> Edward J. Yoon wrote:
>>>
>>>>
>>>> Hi communities,
>>>>
>>>> Do you have any idea how to get the pairs of all row key combinations
>>>> w/o repetition on Map/Reduce as describe below?
>>>>
>>>> Input : (MapFile or Hbase Table)
>>>>
>>>> <Key1, Value or RowResult>
>>>> <Key2, Value or RowResult>
>>>> <Key3, Value or RowResult>
>>>> <Key4, Value or RowResult>
>>>>
>>>> Output :
>>>>
>>>> <Key1, Key2>
>>>> <Key1, Key3>
>>>> <Key1, Key4>
>>>> <Key2, Key3>
>>>> <Key2, Key4>
>>>> <Key3, Key4>
>>>>
>>>>
>>>
>>> One way to do it would be as follows
>>> For every key with index i,
>>> for (k=0; k < i; k++) {
>>> emit(i,key_i)
>>> }
>>> So the above input becomes
>>> 1,key1
>>> 1,key1
>>> 1,key1
>>>
>>>>
>>>> It would be nice if someone can review my pseudo code of traditional
>>>> CF using cosine similarity.
>>>> http://wiki.apache.org/hama/TraditionalCollaborativeFiltering
>>>>
>>>> Thanks.
>>>>
>>>>
>>>
>>>
>>
>>
>>
>>
>
>

Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org
