lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <sar...@gmail.com>
Subject Re: POS tagging in Lucene
Date Tue, 18 Oct 2016 18:59:13 GMT
Hi Niki,

> On Oct 18, 2016, at 7:27 AM, Niki Pavlopoulou <niki@exonar.com> wrote:
> 
> Hi all,
> 
> I am using Lucene and OpenNLP for POS tagging. I would like to support
> biGrams with POS tags as well. For example, I would like something like
> that:
> 
> Input: (I[PRP], am[VBP], using[VBG], Lucene[NNP])
> Output: (I[PRP] am[VBP], am[VBP] using[VBG], using[VBG] Lucene[NNP])
> 
> The problem above is that I do not have "pure" tokens, like "I", "am" etc.,
> so the analysis could be wrong if I add the POS tags as an input in Lucene.
> Is there a way to solve this, apart from creating my custome Lucene
> analyser?

To create your bigrams, check out ShingleFilter: <http://lucene.apache.org/core/6_2_1/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html>

I’m not sure what you mean by “the analysis could be wrong if I add the POS tags as an
input in Lucene” - can you give an example?

You may be interested in the work-in-progress addition of OpenNLP integration with Lucene
here: <https://issues.apache.org/jira/browse/LUCENE-2899>

--
Steve
www.lucidworks.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message