incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Fwd: Get the pairs of all row key combinations w/o repetition
Date Thu, 14 Aug 2008 02:33:53 GMT
---------- Forwarded message ----------
From: Edward J. Yoon <edwardyoon@apache.org>
Date: Wed, Aug 13, 2008 at 11:34 PM
Subject: Re: Get the pairs of all row key combinations w/o repetition
To: core-user@hadoop.apache.org


Hmmm, Yes!! I'll try as above, It looks good.

Thanks, Ed

On Wed, Aug 13, 2008 at 10:52 PM, Amar Kamat <amarrk@yahoo-inc.com> wrote:
> Edward J. Yoon wrote:
>>
>> Yes, but then, as the i grows, the task to workload ratio gets larger
>> and larger. Is It Right?
>>
>
> I hope you have seen the corrected version in the latest email. What do you
> mean by *i*? If you mean the index in the key ordering then smaller the
> index, larger the keys associated with it and that is what we want. If you
> mean the total number of keys then yes, larger the number of keys more the
> combinations/associations the smaller key has to make. Since there will be
> mC2 combinations (m : num keys), one can optimize it to have mC2 / N values
> per reducer (N : num-reducers). Something like
>
> partition(index i, key key_j, int N) { // N is num reducers
>  // find the data per reducer
>  int dataPerRed = mC2 / N; // assuming m is known
>  int prev_sum = 0;
>  // calculate the total combinations contributed by previous indexes
>  for (k=1; k < i; k++) {
>   prev_sum += m - k + 1; // this adds the number of combinations contributed
> by kth index
>  }
>  prev_sum += j - i + 1 // self contribution
>  return prev_sum % dataPerRed
> }
> I think this might work.
> Amar
>>
>> -Edward
>>
>> On Wed, Aug 13, 2008 at 9:23 PM, Amar Kamat <amarrk@yahoo-inc.com> wrote:
>>
>>>
>>> Edward J. Yoon wrote:
>>>
>>>>
>>>> Hi communities,
>>>>
>>>> Do you have any idea how to get the pairs of all row key combinations
>>>> w/o repetition on Map/Reduce as describe below?
>>>>
>>>> Input : (MapFile or Hbase Table)
>>>>
>>>> <Key1, Value or RowResult>
>>>> <Key2, Value or RowResult>
>>>> <Key3, Value or RowResult>
>>>> <Key4, Value or RowResult>
>>>>
>>>> Output :
>>>>
>>>> <Key1, Key2>
>>>> <Key1, Key3>
>>>> <Key1, Key4>
>>>> <Key2, Key3>
>>>> <Key2, Key4>
>>>> <Key3, Key4>
>>>>
>>>>
>>>
>>> One way to do it would be as follows
>>> For every key with index i,
>>> for (k=0; k < i; k++) {
>>> emit(i,key_i)
>>> }
>>> So the above input becomes
>>> 1,key1
>>> 1,key1
>>> 1,key1
>>>
>>>>
>>>> It would be nice if someone can review my pseudo code of traditional
>>>> CF using cosine similarity.
>>>> http://wiki.apache.org/hama/TraditionalCollaborativeFiltering
>>>>
>>>> Thanks.
>>>>
>>>>
>>>
>>>
>>
>>
>>
>>
>
>



--
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Mime
View raw message