lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <abenede...@apache.org>
Subject Re: A working example to play with Naive Bayes classifier
Date Fri, 15 Jul 2016 16:44:45 GMT
But how big it is your index ? Are you expecting Solr to automatically
classify your documents without any knowledge groundbase ?
Please attach an example of schema.
There was a reason if I asked you :)
Seems related the fact we get no token from the text analysis.

Cheers

On Fri, Jul 15, 2016 at 12:11 PM, Tomas Ramanauskas <
Tomas.Ramanauskas@springer.com> wrote:

> Hi, Allesandro,
>
> sorry for the delay. What do you mean?
>
>
> As I mentioned earlier, I followed a super simply set of steps.
>
> 1. Download Solr
> 2. Configure classification
> 3. Create some documents using curl over HTTP.
>
>
> Is it difficult to reproduce the steps / problem?
>
>
> Tomas
>
>
>
> > On 23 Jun 2016, at 16:42, Alessandro Benedetti <
> benedetti.alex85@gmail.com> wrote:
> >
> > Can you give an example of your schema, and can you run a simple query
> for
> > you index, curious to see how the input fields are analyzed.
> >
> > Cheers
> >
> > On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti <
> > benedetti.alex85@gmail.com> wrote:
> >
> >> This is better!  At list the classifier is invoked!
> >> How many docs in the index have the class assigned?
> >> Take a look to the stacktrace and you should find the cause!
> >> I am now on mobile, I will check the code tomorrow!
> >> Cheers
> >> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" <
> >> Tomas.Ramanauskas@springer.com> wrote:
> >>
> >>>
> >>> I also tried with this config (adding **):
> >>>
> >>>
> >>>  <initParams path="/update/**">
> >>>    <lst name="defaults">
> >>>      <str name="update.chain">classification</str>
> >>>    </lst>
> >>>  </initParams>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> And I get the error:
> >>>
> >>>
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book15",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s": null,
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]'
> >>>
> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat
> >>>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat
> >>>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat
> >>>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat
> >>>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat
> >>>
> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat
> >>>
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat
> >>>
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat
> >>>
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat
> >>>
> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat
> >>>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat
> >>>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat
> >>>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat
> >>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat
> >>>
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat
> >>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat
> >>>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
> >>>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
> >>>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
> >>>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> >>>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
> >>>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
> >>>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
> >>>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
> >>>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
> >>>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
> >>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
> >>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
> >>>
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
> >>>
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
> >>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
> >>>
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
> >>>
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
> >>>
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
> >>>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
> >>>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
> >>> java.lang.Thread.run(Thread.java:745)\n","code":500}}
> >>>
> >>>
> >>> Tomas
> >>>
> >>>
> >>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas <
> >>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com>>
> >>> wrote:
> >>>
> >>> Thanks for the response, Alessandro.
> >>>
> >>> I tried this and it didn’t work either:
> >>>
> >>>
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book14",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s": null,
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]’
> >>>
> >>> {"responseHeader":{"status":0,"QTime":2}}
> >>>
> >>> $ curl http://localhost:8983/solr/demo/get?id=book14
> >>> {
> >>>  "doc":
> >>>  {
> >>>    "id":"book14",
> >>>    "title_t":["The Way of Kings"],
> >>>    "author_s":"Brandon Sanderson",
> >>>    "pubyear_i":2010,
> >>>    "ISBN_s":"978-0-7653-2635-5",
> >>>    "_version_":1537854598189940736}}
> >>>
> >>>
> >>> I don’t see “cat_s” field in the results at all.
> >>>
> >>>
> >>> Tomas
> >>>
> >>>
> >>> On 22 Jun 2016, at 16:39, Alessandro Benedetti <abenedetti@apache.org
> >>> <mailto:abenedetti@apache.org>> wrote:
> >>>
> >>> Hi Tomas,
> >>> first consideration :
> >>> an empty string is different from a NULL string.
> >>> This is controversial, I would suggest you to never use the empty
> String
> >>> as
> >>> this can cause some others side effect.
> >>> Apart from that, the plugin will add the class only if the class field
> is
> >>> without any value
> >>>
> >>> Object documentClass = doc.getFieldValue(classFieldName);
> >>> if (documentClass == null) {
> >>>
> >>> Saying that, I would suggest you to build a sample index with some
> >>> document and then try to classify.
> >>> If this doesn't solve your issue, I can help you further.
> >>>
> >>> Cheers
> >>>
> >>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
> >>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com>>
> >>> wrote:
> >>>
> >>> I also tried this configuration, but could get the feature to work:
> >>>
> >>>
> >>>
> >>> <initParams path="/update/">
> >>>   <lst name="defaults">
> >>>     <str name="update.chain">classification</str>
> >>>   </lst>
> >>> </initParams>
> >>>
> >>>
> >>> <updateRequestProcessorChain name="classification">
> >>>   <processor class="solr.ClassificationUpdateProcessorFactory">
> >>>     <str name="inputFields">title_t,author_s</str>
> >>>     <str name="classField">cat_s</str>
> >>>     <str name="algorithm">bayes</str>
> >>>   </processor>
> >>> </updateRequestProcessorChain>
> >>>
> >>>
> >>> Tomas
> >>>
> >>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
> >>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com
> >>>> <mailto:Tomas.Ramanauskas@springer.com>>
> >>> wrote:
> >>>
> >>> P.S. The version I use:
> >>>
> >>> 6.1.0-68
> >>>
> >>> Also, earlier I said “If I modify an existing record, I think the
> >>> functionality works:”, but I think it doesn’t work for me at all.
> >>>
> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
> >>> {
> >>> "doc":
> >>> {
> >>>   "id":"book1",
> >>>   "title_t":["The Way of Kings"],
> >>>   "author_s":"Brandon Sanderson",
> >>>   "cat_s":"fantasy",
> >>>   "pubyear_i":2010,
> >>>   "ISBN_s":"978-0-7653-2635-5",
> >>>   "_version_":1535488016326328320}}
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book1",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s":"aaa",
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]'
> >>> {"responseHeader":{"status":0,"QTime":0}}
> >>>
> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
> >>> {
> >>> "doc":
> >>> {
> >>>   "id":"book1",
> >>>   "title_t":["The Way of Kings"],
> >>>   "author_s":"Brandon Sanderson",
> >>>   "cat_s":"fantasy",
> >>>   "pubyear_i":2010,
> >>>   "ISBN_s":"978-0-7653-2635-5",
> >>>   "_version_":1535488016326328320}}
> >>>
> >>>
> >>> Tomas
> >>>
> >>>
> >>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
> >>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com
> >>>> <mailto:Tomas.Ramanauskas@springer.com>>
> >>> wrote:
> >>>
> >>> Hi, everyone,
> >>>
> >>>
> >>> would someone be able to share a working example (step by step) that
> >>> demonstrates the use of Naive Bayes classifier in Solr?
> >>>
> >>>
> >>> I followed this Blog post:
> >>>
> >>>
> >>>
> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
> >>>
> >>> And this tutorial:
> >>> http://yonik.com/solr-tutorial/
> >>>
> >>> And this JIRA ticket:
> >>> https://issues.apache.org/jira/browse/SOLR-7739
> >>>
> >>>
> >>>
> >>> So this is my configuration file (only what I added or modified):
> >>>
> >>> <initParams path="/update/**">
> >>>   <lst name="defaults">
> >>>     <str name="update.chain">classification</str>
> >>>   </lst>
> >>> </initParams>
> >>>
> >>>
> >>> <updateRequestProcessorChain name="classification">
> >>>   <processor class="solr.ClassificationUpdateProcessorFactory">
> >>>     <str name="inputFields">title_t,author_s</str>
> >>>     <str name="classField">cat_s</str>
> >>>     <str name="algorithm">bayes</str>
> >>>   </processor>
> >>> </updateRequestProcessorChain>
> >>>
> >>>
> >>>
> >>> If I modify an existing record, I think the functionality works:
> >>>
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book1",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s":"",
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]'
> >>> {"responseHeader":{"status":0,"QTime":8}}
> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
> >>> {
> >>> "doc":
> >>> {
> >>>   "id":"book1",
> >>>   "title_t":["The Way of Kings"],
> >>>   "author_s":"Brandon Sanderson",
> >>>   "cat_s":"fantasy",
> >>>   "pubyear_i":2010,
> >>>   "ISBN_s":"978-0-7653-2635-5",
> >>>   "_version_":1535488016326328320}}
> >>>
> >>>
> >>>
> >>>
> >>> If I add a new document, something isn’t quite working:
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book7",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s":"",
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]'
> >>> {"responseHeader":{"status":0,"QTime":0}}
> >>> $ curl http://localhost:8983/solr/demo/get?id=book7
> >>> {
> >>> "doc":null}
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> --------------------------
> >>>
> >>> Benedetti Alessandro
> >>> Visiting card : http://about.me/alessandro_benedetti
> >>>
> >>> "Tyger, tyger burning bright
> >>> In the forests of the night,
> >>> What immortal hand or eye
> >>> Could frame thy fearful symmetry?"
> >>>
> >>> William Blake - Songs of Experience -1794 England
> >>>
> >>>
> >>>
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card - http://about.me/alessandro_benedetti
> > Blog - http://alexbenedetti.blogspot.co.uk
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message