lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Long <jeremy.l...@gmail.com>
Subject TokenFilter Question
Date Sat, 24 Feb 2018 01:12:22 GMT
I am hoping someone could point me in the right direction - while I have a
working solution I do not feel it is the best/correct solution to the
problem I was trying to solve.

My project is using Lucene to perform matching between two data sets. Where
one may have the text "Red Green" and the other would use "redgreen". What
I have done is create a Token Pair Concatenating Filter:
https://github.com/jeremylong/DependencyCheck/blob/master/core/src/main/java/org/owasp/dependencycheck/data/lucene/TokenPairConcatenatingFilter.java.
Where the query "field:(red blue green)" would end up being parsed to
"+field:red +field:redblue +field:blue +field:bluegreen +field:green".
However, my implementation ends up adding superfluous parenthesis to the
parsed query and I'm fairly certain I've missed a few key points with how
to implement a token filter that injects additional tokens into the stream.

I would be most appreciative if someone could take a look at the
implementation and suggest any improvements or point me to any
documentation that could help me better understand how a TokenFilter can
inject additional tokens into the stream.

Thanks in advance,

Jeremy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message