lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niki Pavlopoulou <n...@exonar.com>
Subject Re: POS tagging in Lucene
Date Wed, 19 Oct 2016 09:55:50 GMT
Hi Steve,

thank you for your answer. I created a custom Lucene Analyser in the end.
Just to clarify on what I mean, Lucene works perfectly for pure words, but
since it does not support POS tagging some workaround needs to be done for
the analysis of tokens with POS tags. For example:

Input without POS tags: "I love Lucene's library. It is perfect."
Output: List(love, lucene, library, perfect)

Input with POS tags: "I[PRP] love[VBP] Lucene's[NNP] library[NN] It[PRP]
is[VBZ] perfect[JJ]"
Output: List(i[prp], love[vbp], lucene's[nnp], library[nn], it[prp],
is[vbz], perfect[jj])
*Desired output*: List(love[vbp], lucene[nnp], library[nn], perfect[jj])

If one does the POS tagging after the analysis, then the tags might be
wrong as the right syntax has been lost. This is why the POS tagging needs
to happen early on and then the analysis to take place.

Regards,
Niki.

On 18 October 2016 at 19:59, Steve Rowe <sarowe@gmail.com> wrote:

> Hi Niki,
>
> > On Oct 18, 2016, at 7:27 AM, Niki Pavlopoulou <niki@exonar.com> wrote:
> >
> > Hi all,
> >
> > I am using Lucene and OpenNLP for POS tagging. I would like to support
> > biGrams with POS tags as well. For example, I would like something like
> > that:
> >
> > Input: (I[PRP], am[VBP], using[VBG], Lucene[NNP])
> > Output: (I[PRP] am[VBP], am[VBP] using[VBG], using[VBG] Lucene[NNP])
> >
> > The problem above is that I do not have "pure" tokens, like "I", "am"
> etc.,
> > so the analysis could be wrong if I add the POS tags as an input in
> Lucene.
> > Is there a way to solve this, apart from creating my custome Lucene
> > analyser?
>
> To create your bigrams, check out ShingleFilter: <
> http://lucene.apache.org/core/6_2_1/analyzers-common/org/
> apache/lucene/analysis/shingle/ShingleFilter.html>
>
> I’m not sure what you mean by “the analysis could be wrong if I add the
> POS tags as an input in Lucene” - can you give an example?
>
> You may be interested in the work-in-progress addition of OpenNLP
> integration with Lucene here: <https://issues.apache.org/
> jira/browse/LUCENE-2899>
>
> --
> Steve
> www.lucidworks.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message