mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: term vectors not created in SparseVectorsFromSequenceFiles using tf weighting and maxDFSigma filtering
Date Sun, 22 Jan 2012 23:00:03 GMT
What were the command/options you were passing in?


On Jan 18, 2012, at 4:26 PM, John Conwell wrote:

> I got latest from Trunk and built it, and when
> running SparseVectorsFromSequenceFiles I noticed what I think is a bug.
> The SparseVectorsFromSequenceFiles throws an exception when you want term
> frequency vectors output, with the maxDFSigma filtering option.
> 
> Basically the if / else if section shown below, will skip
> calling DictionaryVectorizer.createTermFrequencyVectors when have that
> combination.  The condition will create vectors when you want tf vectors
> without maxDFSigma filtering, or tfidf vectors with maxDFSigma filtering,
> but if you want tf vectors with maxDFSigma filtering, it totally skips over
> the call to createTermFrequencyVectors, and later on throws an exception
> because the vector input path doesn't exist.
> 
> Is this a known issue?  I'm assuming thats not the way its suposed to work,
> correct?  If so, I think some sort of validation should break the user out
> before they start processing anything
> 
> //at line ~267 in trunk
> 
> if (!processIdf && !shouldPrune) {
> 
>        DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath,
> outputDir, tfDirName, conf, minSupport, maxNGramSize,
> 
>          minLLRValue, norm, logNormalize, reduceTasks, chunkSize,
> sequentialAccessOutput, namedVectors);
> 
> } else if (processIdf) {
> 
>        DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath,
> outputDir, tfDirName, conf, minSupport, maxNGramSize,
> 
>          minLLRValue, -1.0f, false, reduceTasks, chunkSize,
> sequentialAccessOutput, namedVectors);
> 
> }
> 
> -- 
> 
> Thanks,
> John C
> 
> 
> 
> 
> -- 
> 
> -- John C

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message