lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xi Shen <davidshe...@gmail.com>
Subject CharTermAttribute/PositionIncrementAttribute which one can help combine two terms into one?
Date Tue, 25 Dec 2012 07:29:21 GMT
Hi,

I am new to Lucene, and I need to implement a TokenFilter to combine two
terms into one, based on some domain knowledge specific rules.

I have little control over the input data, and as far as I can see, I need
to use the Space tokenizer,  WrodDelimiter filter and StopWords filter to
give a useful TokenStream, instead of a list of garbage.

After the input processed by these tokenizer and filters, I found some of
the keywords are separated into two. E.g.

k1a k1b

it should be: k1ak1b, as a whole

I am not sure which TermAttribute can help me. I guess it should
be CharTermAttribute, or PositionIncrementAttribute. Currently, based on
the documents/blogs I read, the attributes are state-less. How can I make a
attribute get a term that has been processed before?


-- 
Regards´╝î
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message