lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <benedetti.ale...@gmail.com>
Subject Re: Using SimpleNaiveBayesClassifier in solr
Date Mon, 12 Oct 2015 09:41:21 GMT
Hi Yewint,
>
> The sample test code inside seems like that classifier read the whole index
> db to train the model everytime when classification happened for
> inputDocument. or am I misunderstanding something here?


I would suggest you to take a look to a couple of articles I wrote last
summer about the Classification in Lucene and Solr :

http://alexbenedetti.blogspot.co.uk/2015/07/lucene-document-classification.html

http://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html

Basically your misunderstood is that this module work as standard
classifier, which is not our case.
Lucene Classification doesn't train a model over time, the Index is your
model.
It uses the Index data structures to perform the classification processes
(Knn and Simple Bayes are the algorithms I explored at that time) .
Basically the algorithms access to Term Frequencies and Document
Frequencies stored in the Inverted index.

Having a big Index will affect as of course we are querying the index, but
not because we are building a model.

+1 on all Tommaso's observations!

Cheers



On 10 October 2015 at 20:36, Yewint Ko <yewintko2010@gmail.com> wrote:

> Hi
>
> I am trying to use SimpleNaiveBayesClassifier in my solr project. Currently
> looking at its test base ClassificationTestBase.java.
>
> The sample test code inside seems like that classifier read the whole index
> db to train the model everytime when classification happened for
> inputDocument. or am I misunderstanding something here? If i had a large
> index db, will it impact performance?
>
> protected void checkCorrectClassification(Classifier<T> classifier, String
> inputDoc, T expectedResult, Analyzer analyzer, String textFieldName, String
> classFieldName, Query query) throws Exception {
>
>     AtomicReader atomicReader = null;
>
>     try {
>
>       populateSampleIndex(analyzer);
>
>       atomicReader = SlowCompositeReaderWrapper.wrap(indexWriter
> .getReader());
>
>       classifier.train(atomicReader, textFieldName, classFieldName,
> analyzer,
> query);
>
>       ClassificationResult<T> classificationResult =
> classifier.assignClass(
> inputDoc);
>
>       assertNotNull(classificationResult.getAssignedClass());
>
>       assertEquals("got an assigned class of " +
> classificationResult.getAssignedClass(),
> expectedResult, classificationResult.getAssignedClass());
>
>       assertTrue("got a not positive score " +
> classificationResult.getScore(),
> classificationResult.getScore() > 0);
>
>     } finally {
>
>       if (atomicReader != null)
>
>         atomicReader.close();
>
>     }
>
>   }
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message