Thanks Sean. Currently I'm thinking of reading out the current key class from the SequenceFile
and just propagating it through. Do you think that's reasonable?
On Dec 23, 2011, at 4:52 AM, "Sean Owen (Commented) (JIRA)" <jira@apache.org> wrote:
>
> [ https://issues.apache.org/jira/browse/MAHOUT-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175408#comment-13175408
]
>
> Sean Owen commented on MAHOUT-904:
> ----------------------------------
>
> (I don't know if this is a relevant comment, but we ought to be using VarIntWritable
and VarLongWritable, not IntWritable and LongWritable, for better space savings.)
>
>> SplitInput should support randomizing the input
>> -----------------------------------------------
>>
>> Key: MAHOUT-904
>> URL: https://issues.apache.org/jira/browse/MAHOUT-904
>> Project: Mahout
>> Issue Type: Improvement
>> Reporter: Grant Ingersoll
>> Assignee: Raphael Cendrillon
>> Labels: MAHOUT_INTRO_CONTRIBUTE
>> Attachments: MAHOUT-904.patch, MAHOUT-904.patch, MAHOUT-904.patch, MAHOUT-904.patch,
MAHOUT-904.patch, MAHOUT-904.patch
>>
>>
>> For some learning tasks, we need the input to be randomized (SGD) instead of blocks
of labels all at once. SplitInput is a useful tool for setting up train/test files but it
currently doesn't support randomizing the input.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
|