mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Cooper-Ellis <...@ziftr.com>
Subject Re: How to get document count for TFIDF calculate method?
Date Tue, 29 Jul 2014 17:40:01 GMT
Hi Vaibhav,

Thanks for the reply. It doesn't look like total count of keys in
frequency.file-0 corresponds to the number of documents, because I only
used a couple hundred documents to build the model and there are thousands
of keys in frequency.file-0. Am I misunderstanding something?


On Tue, Jul 29, 2014 at 1:15 PM, vaibhav srivastava <vaibhavcse30@gmail.com>
wrote:

> Hi if I am correct you want to know the number of documents by reading
> frequency.file-0; You can use the SequenceFileReader to load the frequency
> file and then count the number of keys that will give you the number of
> documents.
> Hope this helps,
> Thanks,
> vaibhav
>
>
> On Tue, Jul 29, 2014 at 10:32 PM, Jonathan Cooper-Ellis <jce@ziftr.com>
> wrote:
>
> > Hey guys,
> >
> > I'm trying to make a Bayesian classifier, but I'm having a hard time
> > figuring out how to programatically determine the value of the numDocs
> > param for calculate method in TFIDF, using the files generated building
> the
> > model on the command line.
> >
> > I saw some code that did it like this:
> >
> > int numDocs = documentFrequency.get(-1).intValue();
> >
> > Where documentFrequency is a HashMap<Integer,Long> read from
> > frequency.file-0, but there's no key -1 in the file so its giving me an
> NPE
> > when I try to pass that to tfidf.calculate.
> >
> > Anyone know what I'm doing wrong?
> >
> >
> > Best,
> >
> > jce
> >
>
>
>
> --
> Thanks and Regards,
> Vaibhav Srivastava
> Email-id: vaibhavcse30@gmail.com
> Mobile no.: 9552543029
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message