lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Embry, Clay" <Clay.Em...@vignette.com>
Subject phrase search with custom TokenFilter
Date Mon, 10 Mar 2008 18:28:50 GMT
Hi, I have written a TokenFilter which breaks up words with internal dot characters and adds
the whole word plus the pieces as tokens in the stream. I am using that TokenFilter with the
StandardAnalyzer to index my documents. Then I do searches using the StandardAnalyzer. Everything
is working great except for some phrase searches. Here's an example:

Document string
---------------
entity-cache.size-limit

StandardAnalyzer token - position increment
-------------------------------------------

(entity,0,6,type=<alphanum>) - 1

(cache.size,7,17,type=<host>) - 1

(limit,18,23,type=<alphanum>) - 1


MyAnalyzer token - position increment
-------------------------------------

(entity,0,6,type=<alphanum>) - 1

(cache.size,7,17,type=<host>) - 1

(limit,18,23,type=<alphanum>) - 1

(cache,7,12,type=<alphanum>) - 1

(size,13,17,type=<alphanum>) - 1



Search string (StandardAnalyzer)
--------------------------------
"cache.size limit"



The search finds the doc if I use the StandardAnalyzer to index, but not if I use MyAnalyzer
to index. Can anyone see why that would be true? The first three Tokens of each TokenStream
are exactly the same and it looks like both would be found by that search phrase. Do I need
to change the position offsets on my extra Tokens or something?



Thanks for any help.

==

Clay Embry










Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message