lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niki Pavlopoulou <>
Subject POS tagging in Lucene
Date Tue, 18 Oct 2016 11:27:19 GMT
Hi all,

I am using Lucene and OpenNLP for POS tagging. I would like to support
biGrams with POS tags as well. For example, I would like something like

Input: (I[PRP], am[VBP], using[VBG], Lucene[NNP])
Output: (I[PRP] am[VBP], am[VBP] using[VBG], using[VBG] Lucene[NNP])

The problem above is that I do not have "pure" tokens, like "I", "am" etc.,
so the analysis could be wrong if I add the POS tags as an input in Lucene.
Is there a way to solve this, apart from creating my custome Lucene

Thank you in advance.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message