mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1055) Change id fields to use LongWritable instead of IntWritable
Date Mon, 13 Aug 2012 14:16:38 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433156#comment-13433156
] 

Ted Dunning commented on MAHOUT-1055:
-------------------------------------

The major problem with long's are that you can't index arrays with a long.

The standard work-around is to use a hash of the long as the integer sized ID.


                
> Change id fields to use LongWritable instead of IntWritable
> -----------------------------------------------------------
>
>                 Key: MAHOUT-1055
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1055
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.7
>            Reporter: Markus Paaso
>
> Why is IntWritable used as id field type in Mahout CVB? (org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper)
> Does Long have that significant impact on performance?
> Long is much more usable as id type and int causes compatibility issues like the one
below.
> In method org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter() LongWritable
is used correctly as id field type.
> I suggest that every IntWritable id should be changed to LongWritable.
> Sequencefile produced by command 'mahout lucene.vector' cannot be handled by command
'mahout cvb' due to this id type incompatibility issue.
> see http://mahout.markmail.org/thread/r3m6ojkpbzlxxizy

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message