lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Villarejo <>
Subject Part of speech search with lucene
Date Tue, 03 Mar 2015 18:21:40 GMT
After many google searchs I decided to post my problem here hoping that
someone help me. What I want to achieve is to perform queries as follows
(Don't worry about the query format):

q1: (adjective) "jumps" (preposition) // any adj followed by "jumps"
followed by any prep.
q2: (adjective:brown) "jumps" (preposition) // brown as adj. followed by
"jumps" followed by any prep.
q3: (adjective:brown) (verb:jumps) (preposition) // brown as adj followed
by jumps as verb followed by any preposition.

In a more general form, what I want is
(POS[:specific_word]) (POS[:specific_word]) (POS[:specific_word])

For that, I have the text tagged as follows:

the|[pos:DT][lemma:the] quick|[pos:JJ][lemma:quick]
brown|[pos:JJ][lemma:brown] fox|[pos:NN][lemma:fox]
jumps|[pos:NNS][lemma:jump] over|[pos:IN][lemma:over]
the|[pos:DT][lemma:the] lazy|[pos:JJ][lemma:lazy] dog|[pos:NN][lemma:dog]

The first thing I thought was to index extra info of each term as payload
and using PayloadNearQuery after in order to access to the payload of each
span. The problem is that PayloadNearQuery match the terms first and then
access its payload, so none of the 3 above queries will work. (correct me
if I'm wrong)

The second thing I thought was to index extra info as synonyms of the term
but, this way, the second query won't work since I can't ask if the first
term is an adj and the specific word "brown" simultaneously.

Any way to address this problem, suggestions, etc. will be appreciated.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message