mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <pran...@xebia.com>
Subject Re: R: Using recommenders with String identifiers
Date Fri, 09 Mar 2012 02:10:21 GMT
Are these identifiers used as keys for mappers somewhere?
If yes, then the sorting phase of map reduce will be much faster with 
long, as the key comparison time will be less ( long comparison will 
take less time than String comparison, due to lesser number of bytes  ) 
as well as more records can be kept in memory while sorting ( because 
the size is less ).
I was once processing 1 billion records and just changing the keys from 
String to Long increased the performance by 20%.

Ignore if this is not the case.

On 08-03-2012 19:23, Manuel Blechschmidt wrote:
> Hallo Claudia,
> the reason why longs are use is pure efficiency. When you have a lot of things and a
lot of users and you are using Strings as identifiers you will need a lot of memory just for
saving them. Further processes like equals or hash codes will take longer.
>
> So a long has 4 bytes (64 bits) a UUID string (e.g. 936DA01F-9ABD-4D9D-80C7-02AF85C822A8)
encoded as utf-16 has 72 bytes that means that UUID would consume more then18x the memory
that longs are taking.
>
> /Manuel
>
>
> On 08.03.2012, at 14:27, Claudia Grieco wrote:
>
>> Do you think it's worth the work to change the internal code of Mahout in
>> order to use string identifiers?
>> Thanks
>> Claudia
>>
>> -----Messaggio originale-----
>> Da: Manuel Blechschmidt [mailto:Manuel.Blechschmidt@gmx.de]
>> Inviato: lunedì 5 marzo 2012 11.28
>> A: user@mahout.apache.org
>> Oggetto: Re: Using recommenders with String identifiers
>>
>> Hi Claudia,
>> you have to use an IDMigrator.
>>
>> The following projects shows you an example:
>> https://github.com/ManuelB/facebook-recommender-demo
>>
>> https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/ja
>> va/de/apaxo/bedcon/FacebookRecommender.java
>>
>> Good luck
>>     Manuel
>>
>> On 05.03.2012, at 09:53, Claudia Grieco wrote:
>>
>>> Hi guys,
>>>
>>> I'd like to use mahout to implement a recommender but I'm encountering a
>>> problem:
>>>
>>> Ids of items and users are represented in Mahout as long integers, while
>> my
>>> data comes from an external database that uses strings to identify items
>> and
>>> users.
>>>
>>> Any suggestion as to how I can fix this problem?
>>>
>>> Thanks a lot
>>>
>>> Claudia
>>>
>> -- 
>> Manuel Blechschmidt
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>>
>>


Mime
View raw message