lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yewint Ko <yewintko2...@gmail.com>
Subject Re: Using SimpleNaiveBayesClassifier in solr
Date Wed, 14 Oct 2015 08:08:17 GMT
Thank Ales and Tommaso for your replies

So, is it like the classifier query the whole index db and load onto memory
first before running tokenizer against InputDocument? It sounds like if I
don't close the classifier and my index is big,  i might need bigger
machine. Anyway to reverse the order? Do I sound dump?

On 12 October 2015 at 16:11, Alessandro Benedetti <
benedetti.alex85@gmail.com> wrote:

> Hi Yewint,
> >
> > The sample test code inside seems like that classifier read the whole
> index
> > db to train the model everytime when classification happened for
> > inputDocument. or am I misunderstanding something here?
>
>
> I would suggest you to take a look to a couple of articles I wrote last
> summer about the Classification in Lucene and Solr :
>
>
> http://alexbenedetti.blogspot.co.uk/2015/07/lucene-document-classification.html
>
>
> http://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html
>
> Basically your misunderstood is that this module work as standard
> classifier, which is not our case.
> Lucene Classification doesn't train a model over time, the Index is your
> model.
> It uses the Index data structures to perform the classification processes
> (Knn and Simple Bayes are the algorithms I explored at that time) .
> Basically the algorithms access to Term Frequencies and Document
> Frequencies stored in the Inverted index.
>
> Having a big Index will affect as of course we are querying the index, but
> not because we are building a model.
>
> +1 on all Tommaso's observations!
>
> Cheers
>
>
>
> On 10 October 2015 at 20:36, Yewint Ko <yewintko2010@gmail.com> wrote:
>
> > Hi
> >
> > I am trying to use SimpleNaiveBayesClassifier in my solr project.
> Currently
> > looking at its test base ClassificationTestBase.java.
> >
> > The sample test code inside seems like that classifier read the whole
> index
> > db to train the model everytime when classification happened for
> > inputDocument. or am I misunderstanding something here? If i had a large
> > index db, will it impact performance?
> >
> > protected void checkCorrectClassification(Classifier<T> classifier,
> String
> > inputDoc, T expectedResult, Analyzer analyzer, String textFieldName,
> String
> > classFieldName, Query query) throws Exception {
> >
> >     AtomicReader atomicReader = null;
> >
> >     try {
> >
> >       populateSampleIndex(analyzer);
> >
> >       atomicReader = SlowCompositeReaderWrapper.wrap(indexWriter
> > .getReader());
> >
> >       classifier.train(atomicReader, textFieldName, classFieldName,
> > analyzer,
> > query);
> >
> >       ClassificationResult<T> classificationResult =
> > classifier.assignClass(
> > inputDoc);
> >
> >       assertNotNull(classificationResult.getAssignedClass());
> >
> >       assertEquals("got an assigned class of " +
> > classificationResult.getAssignedClass(),
> > expectedResult, classificationResult.getAssignedClass());
> >
> >       assertTrue("got a not positive score " +
> > classificationResult.getScore(),
> > classificationResult.getScore() > 0);
> >
> >     } finally {
> >
> >       if (atomicReader != null)
> >
> >         atomicReader.close();
> >
> >     }
> >
> >   }
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message