Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7003C200B45 for ; Fri, 15 Jul 2016 18:44:50 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 6E91C160A61; Fri, 15 Jul 2016 16:44:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6B795160A57 for ; Fri, 15 Jul 2016 18:44:49 +0200 (CEST) Received: (qmail 34783 invoked by uid 500); 15 Jul 2016 16:44:47 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 34772 invoked by uid 99); 15 Jul 2016 16:44:47 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jul 2016 16:44:47 +0000 Received: from mail-oi0-f45.google.com (mail-oi0-f45.google.com [209.85.218.45]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 63EB51A0042 for ; Fri, 15 Jul 2016 16:44:47 +0000 (UTC) Received: by mail-oi0-f45.google.com with SMTP id l72so25102374oig.2 for ; Fri, 15 Jul 2016 09:44:47 -0700 (PDT) X-Gm-Message-State: ALyK8tI5ILxpmY5NyJx4V5KQOZ2wK+NAp3nK9H1SRZSnpJ1p2soKeLVh+D9hRE6CBKXywatWygYM3RXWfPeZKA== X-Received: by 10.157.10.76 with SMTP id 70mr5729894otg.13.1468601086429; Fri, 15 Jul 2016 09:44:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.77.70 with HTTP; Fri, 15 Jul 2016 09:44:45 -0700 (PDT) In-Reply-To: References: <2E020FF0-DA3E-4CA5-A72F-6C9B0AEACC36@springer.com> <37460411-3FC7-4717-A445-F4A87C9E93FE@springer.com> <201AA8E1-E7D0-4F43-BD6A-88755E40B66E@springer.com> <1506B9A0-427A-415E-AB4B-A79B39A07DFC@springer.com> <4D2114EE-F502-4DF3-8B4A-9765604CAB07@springer.com> From: Alessandro Benedetti Date: Fri, 15 Jul 2016 17:44:45 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: A working example to play with Naive Bayes classifier To: "solr-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=001a113e257c12d67e0537af56b1 archived-at: Fri, 15 Jul 2016 16:44:50 -0000 --001a113e257c12d67e0537af56b1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable But how big it is your index ? Are you expecting Solr to automatically classify your documents without any knowledge groundbase ? Please attach an example of schema. There was a reason if I asked you :) Seems related the fact we get no token from the text analysis. Cheers On Fri, Jul 15, 2016 at 12:11 PM, Tomas Ramanauskas < Tomas.Ramanauskas@springer.com> wrote: > Hi, Allesandro, > > sorry for the delay. What do you mean? > > > As I mentioned earlier, I followed a super simply set of steps. > > 1. Download Solr > 2. Configure classification > 3. Create some documents using curl over HTTP. > > > Is it difficult to reproduce the steps / problem? > > > Tomas > > > > > On 23 Jun 2016, at 16:42, Alessandro Benedetti < > benedetti.alex85@gmail.com> wrote: > > > > Can you give an example of your schema, and can you run a simple query > for > > you index, curious to see how the input fields are analyzed. > > > > Cheers > > > > On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti < > > benedetti.alex85@gmail.com> wrote: > > > >> This is better! At list the classifier is invoked! > >> How many docs in the index have the class assigned? > >> Take a look to the stacktrace and you should find the cause! > >> I am now on mobile, I will check the code tomorrow! > >> Cheers > >> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" < > >> Tomas.Ramanauskas@springer.com> wrote: > >> > >>> > >>> I also tried with this config (adding **): > >>> > >>> > >>> > >>> > >>> classification > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> And I get the error: > >>> > >>> > >>> > >>> $ curl http://localhost:8983/solr/demo/update -d ' > >>> [ > >>> {"id" : "book15", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s": null, > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5" > >>> } > >>> ]' > >>> > {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.N= ullPointerException\n\tat > >>> > org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassif= ier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat > >>> > org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassif= ier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat > >>> > org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassif= ier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat > >>> > org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassif= ier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat > >>> > org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd= (ClassificationUpdateProcessor.java:94)\n\tat > >>> > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handle= Adds(JsonLoader.java:474)\n\tat > >>> > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.proces= sUpdate(JsonLoader.java:138)\n\tat > >>> > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(J= sonLoader.java:114)\n\tat > >>> > org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat > >>> > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.= java:97)\n\tat > >>> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten= tStreamHandlerBase.java:69)\n\tat > >>> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa= se.java:155)\n\tat > >>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat > >>> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat > >>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\ta= t > >>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja= va:257)\n\tat > >>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja= va:208)\n\tat > >>> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHand= ler.java:1668)\n\tat > >>> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581= )\n\tat > >>> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:= 143)\n\tat > >>> > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:54= 8)\n\tat > >>> > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.j= ava:226)\n\tat > >>> > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.j= ava:1160)\n\tat > >>> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)= \n\tat > >>> > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.ja= va:185)\n\tat > >>> > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.ja= va:1092)\n\tat > >>> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:= 141)\n\tat > >>> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextH= andlerCollection.java:213)\n\tat > >>> > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollecti= on.java:119)\n\tat > >>> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.jav= a:134)\n\tat > >>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat > >>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\t= at > >>> > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:24= 4)\n\tat > >>> > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractCo= nnection.java:273)\n\tat > >>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\ta= t > >>> > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.ja= va:93)\n\tat > >>> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRu= n(ExecuteProduceConsume.java:246)\n\tat > >>> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteP= roduceConsume.java:156)\n\tat > >>> > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.ja= va:654)\n\tat > >>> > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.jav= a:572)\n\tat > >>> java.lang.Thread.run(Thread.java:745)\n","code":500}} > >>> > >>> > >>> Tomas > >>> > >>> > >>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas < > >>> Tomas.Ramanauskas@springer.com= > > >>> wrote: > >>> > >>> Thanks for the response, Alessandro. > >>> > >>> I tried this and it didn=E2=80=99t work either: > >>> > >>> > >>> > >>> $ curl http://localhost:8983/solr/demo/update -d ' > >>> [ > >>> {"id" : "book14", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s": null, > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5" > >>> } > >>> ]=E2=80=99 > >>> > >>> {"responseHeader":{"status":0,"QTime":2}} > >>> > >>> $ curl http://localhost:8983/solr/demo/get?id=3Dbook14 > >>> { > >>> "doc": > >>> { > >>> "id":"book14", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5", > >>> "_version_":1537854598189940736}} > >>> > >>> > >>> I don=E2=80=99t see =E2=80=9Ccat_s=E2=80=9D field in the results at a= ll. > >>> > >>> > >>> Tomas > >>> > >>> > >>> On 22 Jun 2016, at 16:39, Alessandro Benedetti >>> > wrote: > >>> > >>> Hi Tomas, > >>> first consideration : > >>> an empty string is different from a NULL string. > >>> This is controversial, I would suggest you to never use the empty > String > >>> as > >>> this can cause some others side effect. > >>> Apart from that, the plugin will add the class only if the class fiel= d > is > >>> without any value > >>> > >>> Object documentClass =3D doc.getFieldValue(classFieldName); > >>> if (documentClass =3D=3D null) { > >>> > >>> Saying that, I would suggest you to build a sample index with some > >>> document and then try to classify. > >>> If this doesn't solve your issue, I can help you further. > >>> > >>> Cheers > >>> > >>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas < > >>> Tomas.Ramanauskas@springer.com= > > >>> wrote: > >>> > >>> I also tried this configuration, but could get the feature to work: > >>> > >>> > >>> > >>> > >>> > >>> classification > >>> > >>> > >>> > >>> > >>> > >>> > >>> title_t,author_s > >>> cat_s > >>> bayes > >>> > >>> > >>> > >>> > >>> Tomas > >>> > >>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas < > >>> Tomas.Ramanauskas@springer.com >>>> > > >>> wrote: > >>> > >>> P.S. The version I use: > >>> > >>> 6.1.0-68 > >>> > >>> Also, earlier I said =E2=80=9CIf I modify an existing record, I think= the > >>> functionality works:=E2=80=9D, but I think it doesn=E2=80=99t work fo= r me at all. > >>> > >>> $ curl http://localhost:8983/solr/demo/get?id=3Dbook1 > >>> { > >>> "doc": > >>> { > >>> "id":"book1", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s":"fantasy", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5", > >>> "_version_":1535488016326328320}} > >>> > >>> $ curl http://localhost:8983/solr/demo/update -d ' > >>> [ > >>> {"id" : "book1", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s":"aaa", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5" > >>> } > >>> ]' > >>> {"responseHeader":{"status":0,"QTime":0}} > >>> > >>> $ curl http://localhost:8983/solr/demo/get?id=3Dbook1 > >>> { > >>> "doc": > >>> { > >>> "id":"book1", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s":"fantasy", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5", > >>> "_version_":1535488016326328320}} > >>> > >>> > >>> Tomas > >>> > >>> > >>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas < > >>> Tomas.Ramanauskas@springer.com >>>> > > >>> wrote: > >>> > >>> Hi, everyone, > >>> > >>> > >>> would someone be able to share a working example (step by step) that > >>> demonstrates the use of Naive Bayes classifier in Solr? > >>> > >>> > >>> I followed this Blog post: > >>> > >>> > >>> > https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification= -part-1.html?showComment=3D1464358093048#c2489902302085000947 > >>> > >>> And this tutorial: > >>> http://yonik.com/solr-tutorial/ > >>> > >>> And this JIRA ticket: > >>> https://issues.apache.org/jira/browse/SOLR-7739 > >>> > >>> > >>> > >>> So this is my configuration file (only what I added or modified): > >>> > >>> > >>> > >>> classification > >>> > >>> > >>> > >>> > >>> > >>> > >>> title_t,author_s > >>> cat_s > >>> bayes > >>> > >>> > >>> > >>> > >>> > >>> If I modify an existing record, I think the functionality works: > >>> > >>> > >>> $ curl http://localhost:8983/solr/demo/update -d ' > >>> [ > >>> {"id" : "book1", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s":"", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5" > >>> } > >>> ]' > >>> {"responseHeader":{"status":0,"QTime":8}} > >>> $ curl http://localhost:8983/solr/demo/get?id=3Dbook1 > >>> { > >>> "doc": > >>> { > >>> "id":"book1", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s":"fantasy", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5", > >>> "_version_":1535488016326328320}} > >>> > >>> > >>> > >>> > >>> If I add a new document, something isn=E2=80=99t quite working: > >>> > >>> $ curl http://localhost:8983/solr/demo/update -d ' > >>> [ > >>> {"id" : "book7", > >>> "title_t":["The Way of Kings"], > >>> "author_s":"Brandon Sanderson", > >>> "cat_s":"", > >>> "pubyear_i":2010, > >>> "ISBN_s":"978-0-7653-2635-5" > >>> } > >>> ]' > >>> {"responseHeader":{"status":0,"QTime":0}} > >>> $ curl http://localhost:8983/solr/demo/get?id=3Dbook7 > >>> { > >>> "doc":null} > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> -- > >>> -------------------------- > >>> > >>> Benedetti Alessandro > >>> Visiting card : http://about.me/alessandro_benedetti > >>> > >>> "Tyger, tyger burning bright > >>> In the forests of the night, > >>> What immortal hand or eye > >>> Could frame thy fearful symmetry?" > >>> > >>> William Blake - Songs of Experience -1794 England > >>> > >>> > >>> > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card - http://about.me/alessandro_benedetti > > Blog - http://alexbenedetti.blogspot.co.uk > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > --=20 -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England --001a113e257c12d67e0537af56b1--