lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <>
Subject Re: Lucene 4.7 intermittently not applying query filter
Date Fri, 28 Mar 2014 15:28:06 GMT

UAX29URLEmailTokenizer does not emit email components as tokens; “”
will be tokenized as “”, nothing more.  That’s why I asked
what EmailFilter does.

If the filter really is ignored by Lucene, that would be a bug in Lucene.  I think something
else is likely going on, though, which is why I asked you for an example query matching too
many docs and a doc it improperly matches. 


On Mar 28, 2014, at 10:54 AM, Jamie <> wrote:

> Steve
> Thank for the contact. I believe UAX29URLEmailTokenizer tokenizes email addresses as
follows: john.doe john doe mycompany com au
have an overridden query parser that swaps out anyaddress: with to, from, cc, bcc, etc. Inside
the overridden query parser, we call getFieldQuery() to build the clauses...
> Query q = super.getFieldQuery(field, emailAddress, true);
> if (slop!=-1) {
> applySlop(q,slop);
> }
> clauses.add(new BooleanClause(q, BooleanClause.Occur.SHOULD));
> The query is outputted below. Sometimes when it is executed by Lucene, the filter is
> I am busy trying to isolate the issue, since the code is running in a wider system among
other complexities.
> Jamie
> On 2014/03/28, 4:08 PM, Steve Rowe wrote:
>> Hi Jamie,
>> What does EmailFilter do?
>> Why is the expanded form "required for the UAX29URLEmailTokenizer"?  Seems like an
exact match would work on the email address alone, without the expanded components?
>> Do you have an example of a query that reproducibly matches more documents than it
should, and a document that matched but shouldn’t have?
>> Steve  	
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message