mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Chang <weidezhang2...@gmail.com>
Subject Re: bug when generating sparse vector
Date Tue, 06 Sep 2011 04:33:33 GMT
i ended up add a default SmartChineseAnalyzer constructor to get around with
the issue. I have another question. Right now, I can see the following
directories created but it seems to be they are encoded using some binary
format. Is there any tool to double check the generated contents as well as
TF-IDF score calculated ?

df-count  dictionary.file-0  frequency.file-0  tfidf-vectors  tf-vectors
 tokenized-documents  wordcount

Thanks a lot,

Weide

On Mon, Sep 5, 2011 at 9:03 PM, Jake Mannix <jake.mannix@gmail.com> wrote:

> On Mon, Sep 5, 2011 at 8:36 PM, Lance Norskog <goksron@gmail.com> wrote:
> >
> >
> > A Lucene expert could change SparseVectors to handle this case. (There
> > might
> > be other problems.)
> >
>
> I don't think we need a Lucene expert, we just need to change the logic of
> "instantiate
> Analyzer via no-arg constructor" to "if no-arg constructor exist for the
> Analyzer, use it,
> else try the single-arg constructor which takes a LuceneUtil.VERSION as the
> argument".
> And possibly let the client specify the lucene version (making sure to swap
> out all the
> lucene jars which might be needed of that exact version) on the command
> line.
>
>  -jake
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message