lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerome Yang <jey...@pivotal.io>
Subject Re: Solr OpenNLP named entity extraction
Date Mon, 09 Jul 2018 04:50:51 GMT
Hi guys,

In Solrcloud mode, where to put the OpenNLP models?
Upload to zookeeper?
As I test on solr 7.3.1, seems absolute path on local host is not working.
And can not upload into zookeeper if the model size exceed 1M.

Regards,
Jerome

On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe <sarowe@gmail.com> wrote:

> Hi Alexey,
>
> First, thanks for moving the conversation to the mailing list.  Discussion
> of usage problems should take place here rather than in JIRA.
>
> I locally set up Solr 7.3 similarly to you and was able to get things to
> work.
>
> Problems with your setup:
>
> 1. Your update chain is missing the Log and Run update processors at the
> end (I see these are missing from the example in the javadocs for the
> OpenNLP NER update processor; I’ll fix that):
>
>      <processor class="solr.LogUpdateProcessorFactory" />
>      <processor class="solr.RunUpdateProcessorFactory" />
>
>    The Log update processor isn’t strictly necessary, but, from <
> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
> >:
>
>        Do not forget to add RunUpdateProcessorFactory at the end of any
>        chains you define in solrconfig.xml. Otherwise update requests
>        processed by that chain will not actually affect the indexed data.
>
> 2. Your example document is missing an “id” field.
>
> 3. For whatever reason, the pre-trained model "en-ner-person.bin" doesn’t
> extract anything from text “This is Steve Jobs 2”.  It will extract “Steve
> Jobs” from text “This is Steve Jobs in white” e.g. though.
>
> 4. (Not a problem necessarily) You may want to use a multi-valued “string”
> field for the “dest” field in your update chain, e.g. “people_str” (“*_str”
> in the default configset is so configured).
>
> --
> Steve
> www.lucidworks.com
>
> > On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <alex1989ster@gmail.com>
> wrote:
> >
> > Hi once more I am trying to implement named entities extraction using
> this
> > manual
> >
> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
> >
> > I am modified solrconfig.xml like this:
> >
> > <updateRequestProcessorChain name="multiple-extract">
> >   <processor
> class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
> >     <str name="modelFile">opennlp/en-ner-person.bin</str>
> >     <str name="analyzerFieldType">text_opennlp</str>
> >     <str name="source">description_en</str>
> >     <str name="dest">content</str>
> >   </processor>
> > </updateRequestProcessorChain>
> >
> > But when I was trying to add data using:
> >
> > *request:*
> >
> > POST
> >
> http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract
> >
> > <add><doc><field name="description_en">This is Steve Jobs 2
> > </field><field name="content_pos">This is text 2</field><field
> > name="content">This is text for content 2</field></doc></add>
> >
> > *response*
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <response>
> >    <lst name="responseHeader">
> >        <int name="status">0</int>
> >        <int name="QTime">3</int>
> >    </lst>
> > </response>
> >
> > But I don't see any data inserted to *content* field and in any other
> field.
> >
> > *If you need some additional data I can provide it.*
> >
> > Can you help me? What have I done wrong?
>
>

-- 
 Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message