lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <>
Subject Start/end offsets in analyzers
Date Wed, 28 Mar 2007 07:51:33 GMT
I'm fiddling with custom anaylyzers to analyze email addresses to store the full 
email address and the component parts.  It's based on Solr's analyzer framework, 
so I have a StandardTokenizerFactory followed by a EmailFilterFactory.  It produces

Analyzing "<>"

1: [>31:<EMAIL>]
2: [humphrey:1->9:<EMAIL>]
3: [bogart:10->16:<EMAIL>]
4: [casablanca:17->27:<EMAIL>]
5: [com:28->31:<EMAIL>]

I set the start/end offset to be the length of the component, but in the LIA 
book listing 4.6 shows the start/end offsets for the synonyms as the same as the 
original token, whereas I set my start/end as the correct start/end for the 
length and offset of the part.

LIA says these are not used in Lucene - is that still the case for 2.1 and does 
this matter?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message