lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomas Ramanauskas <Tomas.Ramanaus...@springer.com>
Subject Re: A working example to play with Naive Bayes classifier
Date Fri, 15 Jul 2016 11:11:59 GMT
Hi, Allesandro,

sorry for the delay. What do you mean?


As I mentioned earlier, I followed a super simply set of steps.

1. Download Solr
2. Configure classification 
3. Create some documents using curl over HTTP.


Is it difficult to reproduce the steps / problem?


Tomas



> On 23 Jun 2016, at 16:42, Alessandro Benedetti <benedetti.alex85@gmail.com> wrote:
> 
> Can you give an example of your schema, and can you run a simple query for
> you index, curious to see how the input fields are analyzed.
> 
> Cheers
> 
> On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti <
> benedetti.alex85@gmail.com> wrote:
> 
>> This is better!  At list the classifier is invoked!
>> How many docs in the index have the class assigned?
>> Take a look to the stacktrace and you should find the cause!
>> I am now on mobile, I will check the code tomorrow!
>> Cheers
>> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" <
>> Tomas.Ramanauskas@springer.com> wrote:
>> 
>>> 
>>> I also tried with this config (adding **):
>>> 
>>> 
>>>  <initParams path="/update/**">
>>>    <lst name="defaults">
>>>      <str name="update.chain">classification</str>
>>>    </lst>
>>>  </initParams>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> And I get the error:
>>> 
>>> 
>>> 
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book15",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s": null,
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]'
>>> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat
>>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat
>>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat
>>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat
>>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat
>>> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat
>>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat
>>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat
>>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat
>>> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat
>>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat
>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat
>>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat
>>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
>>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
>>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
>>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
>>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
>>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
>>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
>>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
>>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
>>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
>>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
>>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
>>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
>>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
>>> java.lang.Thread.run(Thread.java:745)\n","code":500}}
>>> 
>>> 
>>> Tomas
>>> 
>>> 
>>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas <
>>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com>>
>>> wrote:
>>> 
>>> Thanks for the response, Alessandro.
>>> 
>>> I tried this and it didn’t work either:
>>> 
>>> 
>>> 
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book14",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s": null,
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]’
>>> 
>>> {"responseHeader":{"status":0,"QTime":2}}
>>> 
>>> $ curl http://localhost:8983/solr/demo/get?id=book14
>>> {
>>>  "doc":
>>>  {
>>>    "id":"book14",
>>>    "title_t":["The Way of Kings"],
>>>    "author_s":"Brandon Sanderson",
>>>    "pubyear_i":2010,
>>>    "ISBN_s":"978-0-7653-2635-5",
>>>    "_version_":1537854598189940736}}
>>> 
>>> 
>>> I don’t see “cat_s” field in the results at all.
>>> 
>>> 
>>> Tomas
>>> 
>>> 
>>> On 22 Jun 2016, at 16:39, Alessandro Benedetti <abenedetti@apache.org
>>> <mailto:abenedetti@apache.org>> wrote:
>>> 
>>> Hi Tomas,
>>> first consideration :
>>> an empty string is different from a NULL string.
>>> This is controversial, I would suggest you to never use the empty String
>>> as
>>> this can cause some others side effect.
>>> Apart from that, the plugin will add the class only if the class field is
>>> without any value
>>> 
>>> Object documentClass = doc.getFieldValue(classFieldName);
>>> if (documentClass == null) {
>>> 
>>> Saying that, I would suggest you to build a sample index with some
>>> document and then try to classify.
>>> If this doesn't solve your issue, I can help you further.
>>> 
>>> Cheers
>>> 
>>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
>>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com>>
>>> wrote:
>>> 
>>> I also tried this configuration, but could get the feature to work:
>>> 
>>> 
>>> 
>>> <initParams path="/update/">
>>>   <lst name="defaults">
>>>     <str name="update.chain">classification</str>
>>>   </lst>
>>> </initParams>
>>> 
>>> 
>>> <updateRequestProcessorChain name="classification">
>>>   <processor class="solr.ClassificationUpdateProcessorFactory">
>>>     <str name="inputFields">title_t,author_s</str>
>>>     <str name="classField">cat_s</str>
>>>     <str name="algorithm">bayes</str>
>>>   </processor>
>>> </updateRequestProcessorChain>
>>> 
>>> 
>>> Tomas
>>> 
>>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
>>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com
>>>> <mailto:Tomas.Ramanauskas@springer.com>>
>>> wrote:
>>> 
>>> P.S. The version I use:
>>> 
>>> 6.1.0-68
>>> 
>>> Also, earlier I said “If I modify an existing record, I think the
>>> functionality works:”, but I think it doesn’t work for me at all.
>>> 
>>> $ curl http://localhost:8983/solr/demo/get?id=book1
>>> {
>>> "doc":
>>> {
>>>   "id":"book1",
>>>   "title_t":["The Way of Kings"],
>>>   "author_s":"Brandon Sanderson",
>>>   "cat_s":"fantasy",
>>>   "pubyear_i":2010,
>>>   "ISBN_s":"978-0-7653-2635-5",
>>>   "_version_":1535488016326328320}}
>>> 
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book1",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s":"aaa",
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]'
>>> {"responseHeader":{"status":0,"QTime":0}}
>>> 
>>> $ curl http://localhost:8983/solr/demo/get?id=book1
>>> {
>>> "doc":
>>> {
>>>   "id":"book1",
>>>   "title_t":["The Way of Kings"],
>>>   "author_s":"Brandon Sanderson",
>>>   "cat_s":"fantasy",
>>>   "pubyear_i":2010,
>>>   "ISBN_s":"978-0-7653-2635-5",
>>>   "_version_":1535488016326328320}}
>>> 
>>> 
>>> Tomas
>>> 
>>> 
>>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
>>> Tomas.Ramanauskas@springer.com<mailto:Tomas.Ramanauskas@springer.com
>>>> <mailto:Tomas.Ramanauskas@springer.com>>
>>> wrote:
>>> 
>>> Hi, everyone,
>>> 
>>> 
>>> would someone be able to share a working example (step by step) that
>>> demonstrates the use of Naive Bayes classifier in Solr?
>>> 
>>> 
>>> I followed this Blog post:
>>> 
>>> 
>>> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
>>> 
>>> And this tutorial:
>>> http://yonik.com/solr-tutorial/
>>> 
>>> And this JIRA ticket:
>>> https://issues.apache.org/jira/browse/SOLR-7739
>>> 
>>> 
>>> 
>>> So this is my configuration file (only what I added or modified):
>>> 
>>> <initParams path="/update/**">
>>>   <lst name="defaults">
>>>     <str name="update.chain">classification</str>
>>>   </lst>
>>> </initParams>
>>> 
>>> 
>>> <updateRequestProcessorChain name="classification">
>>>   <processor class="solr.ClassificationUpdateProcessorFactory">
>>>     <str name="inputFields">title_t,author_s</str>
>>>     <str name="classField">cat_s</str>
>>>     <str name="algorithm">bayes</str>
>>>   </processor>
>>> </updateRequestProcessorChain>
>>> 
>>> 
>>> 
>>> If I modify an existing record, I think the functionality works:
>>> 
>>> 
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book1",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s":"",
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]'
>>> {"responseHeader":{"status":0,"QTime":8}}
>>> $ curl http://localhost:8983/solr/demo/get?id=book1
>>> {
>>> "doc":
>>> {
>>>   "id":"book1",
>>>   "title_t":["The Way of Kings"],
>>>   "author_s":"Brandon Sanderson",
>>>   "cat_s":"fantasy",
>>>   "pubyear_i":2010,
>>>   "ISBN_s":"978-0-7653-2635-5",
>>>   "_version_":1535488016326328320}}
>>> 
>>> 
>>> 
>>> 
>>> If I add a new document, something isn’t quite working:
>>> 
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book7",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s":"",
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]'
>>> {"responseHeader":{"status":0,"QTime":0}}
>>> $ curl http://localhost:8983/solr/demo/get?id=book7
>>> {
>>> "doc":null}
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> --------------------------
>>> 
>>> Benedetti Alessandro
>>> Visiting card : http://about.me/alessandro_benedetti
>>> 
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>> 
>>> William Blake - Songs of Experience -1794 England
>>> 
>>> 
>>> 
> 
> 
> -- 
> --------------------------
> 
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England

Mime
View raw message