lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <sar...@gmail.com>
Subject Re: Lucene 4.7 intermittently not applying query filter
Date Fri, 28 Mar 2014 15:28:06 GMT
Jamie,

UAX29URLEmailTokenizer does not emit email components as tokens; “john.doe@mycompany.com.au”
will be tokenized as “john.doe@mycompany.com.au”, nothing more.  That’s why I asked
what EmailFilter does.

If the filter really is ignored by Lucene, that would be a bug in Lucene.  I think something
else is likely going on, though, which is why I asked you for an example query matching too
many docs and a doc it improperly matches. 

Steve

On Mar 28, 2014, at 10:54 AM, Jamie <jamie@mailarchiva.com> wrote:

> Steve
> 
> Thank for the contact. I believe UAX29URLEmailTokenizer tokenizes email addresses as
follows: john.doe@mycompany.com.au john.doe mycompany.com.au john doe mycompany com au com.au.We
have an overridden query parser that swaps out anyaddress: with to, from, cc, bcc, etc. Inside
the overridden query parser, we call getFieldQuery() to build the clauses...
> 
> Query q = super.getFieldQuery(field, emailAddress, true);
> if (slop!=-1) {
> applySlop(q,slop);
> }
> clauses.add(new BooleanClause(q, BooleanClause.Occur.SHOULD));
> 
> The query is outputted below. Sometimes when it is executed by Lucene, the filter is
ignored.
> 
> I am busy trying to isolate the issue, since the code is running in a wider system among
other complexities.
> 
> Jamie
> 
> On 2014/03/28, 4:08 PM, Steve Rowe wrote:
>> Hi Jamie,
>> 
>> What does EmailFilter do?
>> 
>> Why is the expanded form "required for the UAX29URLEmailTokenizer"?  Seems like an
exact match would work on the email address alone, without the expanded components?
>> 
>> Do you have an example of a query that reproducibly matches more documents than it
should, and a document that matched but shouldn’t have?
>> 
>> Steve  	
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message