lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vivekanand Ittigi <vi...@biginfolabs.com>
Subject Re: Integrate solr with openNLP
Date Wed, 04 Jun 2014 11:08:11 GMT
Hi Tommaso,

Yes, you are right. 4.4 version will work.. I'm able to compile now. I'm
trying to apply named recognition(person name) token but im not seeing any
change. my schema.xml looks like this:

<field name="text" type="text_opennlp_pos_ner" indexed="true" stored="true"
multiValued="true"/>

<fieldType name="text_opennlp_pos_ner" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.OpenNLPTokenizerFactory"
          tokenizerModel="opennlp/en-token.bin"
        />
        <filter class="solr.OpenNLPFilterFactory"
          nerTaggerModels="opennlp/en-ner-person.bin"
        />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>

    </fieldType>

Please guide..?

Thanks,
Vivek


On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili <tommaso.teofili@gmail.com>
wrote:

> Hi all,
>
> Ahment was suggesting to eventually use UIMA integration because OpenNLP
> has already an integration with Apache UIMA and so you would just have to
> use that [1].
> And that's one of the main reason UIMA integration was done: it's a
> framework that you can easily hook into in order to plug your NLP algorithm.
>
> If you want to just use OpenNLP then it's up to you if either write your
> own UpdateRequestProcessor plugin [2] to add metadata extracted by OpenNLP
> to your documents or either you can write a dedicated analyzer / tokenizer
> / token filter.
>
> For the OpenNLP integration (LUCENE-2899), the patch is not up to date
> with the latest APIs in trunk, however you should be able to apply it to
> (if I recall correctly) to 4.4 version or so, and also adapting it to the
> latest API shouldn't be too hard.
>
> Regards,
> Tommaso
>
> [1] :
> http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima
> [2] : http://wiki.apache.org/solr/UpdateRequestProcessor
>
>
>
> 2014-06-03 15:34 GMT+02:00 Ahmet Arslan <iorixxx@yahoo.com.invalid>:
>
> Can you extract names, locations etc using OpenNLP in plain/straight java
>> program?
>>
>> If yes, here are two seperate options :
>>
>> 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an
>> example to integrate your NER code into it and write your own indexing
>> code. You have the full power here. No solr-plugins are involved.
>>
>> 2) Use 'Implementing a conditional copyField' given here :
>> http://wiki.apache.org/solr/UpdateRequestProcessor
>> as an example and integrate your NER code into it.
>>
>>
>> Please note that these are separate ways to enrich your incoming
>> documents, choose either (1) or (2).
>>
>>
>>
>> On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi <
>> vivek@biginfolabs.com> wrote:
>> Okay, but i dint understand what you said. Can you please elaborate.
>>
>> Thanks,
>> Vivek
>>
>>
>>
>>
>>
>> On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
>>
>> > Hi Vivekanand,
>> >
>> > I have never use UIMA+Solr before.
>> >
>> > Personally I think it takes more time to learn how to configure/use
>> these
>> > uima stuff.
>> >
>> >
>> > If you are familiar with java, write a class that extends
>> > UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new
>> fields
>> > (organisation, city, person name, etc, to your document. This phase is
>> > usually called 'enrichment'.
>> >
>> > Does that makes sense?
>> >
>> >
>> >
>> > On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi <
>> vivek@biginfolabs.com>
>> > wrote:
>> > Hi Ahmet,
>> >
>> > I followed what you said
>> > https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But
>> how
>> > can i achieve my goal? i mean extracting only name of the organization
>> or
>> > person from the content field.
>> >
>> > I guess i'm almost there but something is missing? please guide me
>> >
>> > Thanks,
>> > Vivek
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi <
>> vivek@biginfolabs.com>
>> > wrote:
>> >
>> > > Entire goal cant be said but one of those tasks can be like this.. we
>> > have
>> > > big document(can be website or pdf etc) indexed to the solr.
>> > > Lets say <field name=content> will sore store the contents of
>> document.
>> > > All i want to do is pick name of persons,places from it using openNLP
>> or
>> > > some other means.
>> > >
>> > > Those names should be reflected in solr itself.
>> > >
>> > > Thanks,
>> > > Vivek
>> > >
>> > >
>> > > On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan <iorixxx@yahoo.com>
>> wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> Please tell us what you are trying to in a new treat. Your high level
>> > >> goal. There may be some other ways/tools such as (
>> > >> https://stanbol.apache.org ) other than OpenNLP.
>> > >>
>> > >>
>> > >>
>> > >> On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi <
>> > >> vivek@biginfolabs.com> wrote:
>> > >>
>> > >>
>> > >>
>> > >> We'll surely look into UIMA integration.
>> > >>
>> > >> But before moving, is this( https://wiki.apache.org/solr/OpenNLP )
>> the
>> > >> only link we've got to integrate?isn't there any other article or
>> link
>> > >> which may help us to do fix this problem.
>> > >>
>> > >> Thanks,
>> > >> Vivek
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan <iorixxx@yahoo.com>
>> wrote:
>> > >>
>> > >> Hi,
>> > >> >
>> > >> >I believe I answered it. Let me re-try,
>> > >> >
>> > >> >There is no committed code for OpenNLP. There is an open ticket
with
>> > >> patches. They may not work with current trunk.
>> > >> >
>> > >> >Confluence is the official documentation. Wiki is maintained by
>> > >> community. Meaning wiki can talk about some uncommitted
>> features/stuff.
>> > >> Like this one : https://wiki.apache.org/solr/OpenNLP
>> > >> >
>> > >> >What I am suggesting is, have a look at
>> > >> https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
>> > >> >
>> > >> >
>> > >> >And search how to use OpenNLP inside UIMA. May be LUCENE-2899 is
>> > already
>> > >> doable with solr-uima. I am adding Tommaso (sorry for this but we
>> need
>> > an
>> > >> authoritative answer here) to clarify this.
>> > >> >
>> > >> >
>> > >> >Also consider indexing with SolrJ and use OpenNLP enrichment outside
>> > the
>> > >> solr. Use openNLP with plain java, enrich your documents and index
>> them
>> > >> with SolJ. You don't have to too everything inside solr as
>> solr-plugins.
>> > >> >
>> > >> >Hope this helps,
>> > >> >
>> > >> >Ahmet
>> > >> >
>> > >> >
>> > >> >
>> > >> >On Monday, June 2, 2014 11:15 PM, Vivekanand Ittigi <
>> > >> vivek@biginfolabs.com> wrote:
>> > >> >Thanks, I will check with the jira.. but you dint answe my first
>> > >> >question..? And there's no way to integrate solr with openNLP?or
is
>> > there
>> > >> >any committed code, using which i can go head.
>> > >> >
>> > >> >Thanks,
>> > >> >Vivek
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> >On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan <iorixxx@yahoo.com>
>> > wrote:
>> > >> >
>> > >> >> Hi,
>> > >> >>
>> > >> >> Here is the jira issue :
>> > >> https://issues.apache.org/jira/browse/LUCENE-2899
>> > >> >>
>> > >> >>
>> > >> >> Anyone can create an account.
>> > >> >>
>> > >> >> I didn't use UIMA by myself and I have little knowledge about
it.
>> > But I
>> > >> >> believe it is possible to use OpenNLP inside UIMA.
>> > >> >> You need to dig into UIMA documentation.
>> > >> >>
>> > >> >> Solr UIMA integration already exists, thats why I questioned
>> whether
>> > >> your
>> > >> >> requirement is possible with uima or not. I don't know the
answer
>> > >> myself.
>> > >> >>
>> > >> >> Ahmet
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi <
>> > >> vivek@biginfolabs.com>
>> > >> >> wrote:
>> > >> >> Hi Arslan,
>> > >> >>
>> > >> >> If not uncommitted code, then which code to be used to integrate?
>> > >> >>
>> > >> >> If i have to comment my problems, which jira and how to put
it?
>> > >> >>
>> > >> >> And why you are suggesting UIMA integration. My requirements
is
>> > >> integrating
>> > >> >> with openNLP.? You mean we can do all the acitivties through
UIMA
>> as
>> > >> we do
>> > >> >> it using openNLP..?like name,location finder etc?
>> > >> >>
>> > >> >> Thanks,
>> > >> >> Vivek
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan
>> > <iorixxx@yahoo.com.invalid
>> > >> >
>> > >> >> wrote:
>> > >> >>
>> > >> >> > Hi,
>> > >> >> >
>> > >> >> > Uncommitted code could have these kind of problems. It
is not
>> > >> guaranteed
>> > >> >> > to work with latest trunk.
>> > >> >> >
>> > >> >> > You could commend the problem you face on the jira ticket.
>> > >> >> >
>> > >> >> > By the way, may be you are after something doable with
already
>> > >> committed
>> > >> >> > UIMA stuff?
>> > >> >> >
>> > >> >> >
>> https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
>> > >> >> >
>> > >> >> > Ahmet
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi <
>> > >> >> vivek@biginfolabs.com>
>> > >> >> > wrote:
>> > >> >> > I followed this link to integrate
>> > >> https://wiki.apache.org/solr/OpenNLP
>> > >> >> to
>> > >> >> > integrate
>> > >> >> >
>> > >> >> > Installation
>> > >> >> >
>> > >> >> > For English language testing: Until LUCENE-2899 is committed:
>> > >> >> >
>> > >> >> >     1.pull the latest trunk or 4.0 branch
>> > >> >> >
>> > >> >> >     2.apply the latest LUCENE-2899 patch
>> > >> >> >     3.do 'ant compile'
>> > >> >> >     cd solr/contrib/opennlp/src/test-files/training
>> > >> >> >     .
>> > >> >> >     .
>> > >> >> >     .
>> > >> >> > i followed first two steps but got the following error
while
>> > >> executing
>> > >> >> 3rd
>> > >> >> > point
>> > >> >> >
>> > >> >> > common.compile-core:
>> > >> >> >     [javac] Compiling 10 source files to
>> > >> >> >
>> > >> >> >
>> > >> >>
>> > >>
>> >
>> /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java
>> > >> >> >
>> > >> >> >     [javac] warning: [path] bad path element
>> > >> >> >
>> > >> >> >
>> > >> >>
>> > >>
>> >
>> "/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar":
>> > >> >> > no such file or directory
>> > >> >> >
>> > >> >> >     [javac]
>> > >> >> >
>> > >> >> >
>> > >> >>
>> > >>
>> >
>> /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43:
>> > >> >> > error: cannot find symbol
>> > >> >> >
>> > >> >> >     [javac]     super(Version.LUCENE_44, input);
>> > >> >> >
>> > >> >> >     [javac]                  ^
>> > >> >> >     [javac]   symbol:   variable LUCENE_44
>> > >> >> >     [javac]   location: class Version
>> > >> >> >     [javac]
>> > >> >> >
>> > >> >> >
>> > >> >>
>> > >>
>> >
>> /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56:
>> > >> >> > error: no suitable constructor found for Tokenizer(Reader)
>> > >> >> >     [javac]     super(input);
>> > >> >> >     [javac]     ^
>> > >> >> >     [javac]     constructor
>> Tokenizer.Tokenizer(AttributeFactory)
>> > is
>> > >> not
>> > >> >> > applicable
>> > >> >> >     [javac]       (actual argument Reader cannot be converted
to
>> > >> >> > AttributeFactory by method invocation conversion)
>> > >> >> >     [javac]     constructor Tokenizer.Tokenizer() is
not
>> applicable
>> > >> >> >     [javac]       (actual and formal argument lists differ
in
>> > length)
>> > >> >> >     [javac] 2 errors
>> > >> >> >     [javac] 1 warning
>> > >> >> >
>> > >> >> > Im really stuck how to passthough this step. I wasted
my entire
>> to
>> > >> fix
>> > >> >> this
>> > >> >> > but couldn't move a bit. Please someone help me..?
>> > >> >> >
>> > >> >> > Thanks,
>> > >> >> > Vivek
>> > >> >> >
>> > >> >> >
>> > >> >>
>> > >> >>
>> > >> >
>> > >>
>> > >
>> > >
>> >
>> >
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message