lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie <ja...@mailarchiva.com>
Subject Lucene 4.7 intermittently not applying query filter
Date Fri, 28 Mar 2014 11:00:20 GMT
Greetings

We have a problem whereby Lucene 4.7 occasionally does not apply a 
filter query during searching. The problem is intermittent. One in 
thirty or so searches will return what appears to be an unfiltered 
result set. There are no exceptions or errors occurring.. just incorrect 
results. We are using realtime search with multiple index readers. Our 
software had been working fine with earlier versions of Lucene. I've 
double checked the query submitted to lucene and it appears to be 
correct. The query looks as follows:

2014-03-28 21:16:38 t.c.s.a.s.StandardSearch [DEBUG] start search 
{searchquery='',query='*:*',filterQuery='QueryWrapperFilter(+archivedate:[201002280000 
TO 201403282115] +cat:email +(to:"john.douglas@mycompany.com.au 
john.douglas mycompany.com.au john douglas mycompany com au com.au" 
to:"john.doe@mycompany.com.au john.doe mycompany.com.au john doe 
mycompany com au com.au" from:"john.douglas@mycompany.com.au 
john.douglas mycompany.com.au john douglas mycompany com au com.au" 
from:"john.doe@mycompany.com.au john.doe mycompany.com.au john doe 
mycompany com au com.au" cc:"john.douglas@mycompany.com.au john.douglas 
mycompany.com.au john douglas mycompany com au com.au" 
cc:"john.doe@mycompany.com.au john.doe mycompany.com.au john doe 
mycompany com au com.au"))',sort='<long: "mydate">!'}

The string "john.doe@mycompany.com.au john.doe mycompany.com.au john doe 
mycompany com au com.au" is the required expansion for the 
UAX29URLEmailTokenizer. By using quotes, I am aiming for an exact match. 
This works most of the time, but not all of the time (as it should).

  I came across: https://issues.apache.org/jira/browse/LUCENE-5502 and 
applied it, but it makes no difference. I tried to downgrade Lucene, but 
it wont read the 4.6 indexes. Can anyone suggest a way forward?

Thanks for your recommendations

Jamie

-------------------------

public final class EmailAnalyzer extends StopwordAnalyzerBase {

   public static final int DEFAULT_MAX_TOKEN_LENGTH = 
StandardAnalyzer.DEFAULT_MAX_TOKEN_LENGTH;
   private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH;
   public static final CharArraySet STOP_WORDS_SET = 
StopAnalyzer.ENGLISH_STOP_WORDS_SET;

   public EmailAnalyzer(Version matchVersion, CharArraySet stopWords) {
     super(matchVersion, stopWords);
   }

   public EmailAnalyzer(Version matchVersion) {
     this(matchVersion, STOP_WORDS_SET);
   }

   public EmailAnalyzer(Version matchVersion, Reader stopwords) throws 
IOException {
     this(matchVersion, loadStopwordSet(stopwords, matchVersion));
   }

   public void setMaxTokenLength(int length) {
     maxTokenLength = length;
   }

  public int getMaxTokenLength() {
     return maxTokenLength;
   }

   protected TokenStreamComponents createComponents(final String 
fieldName, final Reader reader) {
     final UAX29URLEmailTokenizer src = new 
UAX29URLEmailTokenizer(matchVersion, reader);
     src.setMaxTokenLength(maxTokenLength);
     TokenStream tok = new EmailFilter(src);
     tok = new LowerCaseFilter(matchVersion, tok);
     return new TokenStreamComponents(src, tok) {
       protected void setReader(final Reader reader) throws IOException {
         src.setMaxTokenLength(EmailAnalyzer.this.maxTokenLength);
         super.setReader(reader);
       }
     };
   }
}



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message