lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Staveley (Tom)" <rstave...@seseit.com>
Subject RE: BooleanQuery.TooManyClauses on MultiSearcher
Date Thu, 15 Jun 2006 17:43:34 GMT
It is a good point that you raise, Chris. I'm already treating To, Cc, From,
MAIL-FROM, and RCPT-TO as separate fields (the latter fields being from
SMTP). I'd like a "fast and loose" query on james, to find anything relevant
to James. I guess to avoid getting too many Boolean terms, I should have
another field which is a soup of the sender and recipient fields and
tokenise e-mail addresses in it as you suggest.

-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: 15 June 2006 18:28
To: java-user@lucene.apache.org
Subject: RE: BooleanQuery.TooManyClauses on MultiSearcher


: I'd quite like to avoid tokenising james from  james@domain.com, because I
: like the way PrefixQuery (when it works) matches james.dean@holywood.com

well sure ... but if you say that becaues you want "." and "-" to be treaded
specially you could write an Email EmailAnalyzer that produces the token
stream: "james", "dean", "james.dean", "holywood.com",
"james.dean@holywood.com" ... the real question is do you really want a
search for "jam" to match "james.dean@holywood.com" while a search for
"dean", "james dean" and "holywood.com" doesn't ?


: -----Original Message-----
: From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
: Sent: 15 June 2006 16:50
: To: java-user@lucene.apache.org
: Subject: RE: BooleanQuery.TooManyClauses on MultiSearcher
:
:
: : I guess the most expensive thing I'm doing from the perspective of
Boolean
: : clauses is heavily using PrefixQuery.
: :
: : I want my user to be able to find e-mail to, cc or from james@anydomain,
: so
: : I opted for PrefixQuery on James. Bearing in mind that this is causing
me
: : grief with BooleanQuery.TooManyClauses on my MultiSearcher, is there a
: : smarter approach that I should be adopting?
:
: if the only reason you are using a PrefixQuery is so that searchinging for
: "james" matches "james@domain.com" then i think MDC is right, split that
: field up (or have one field, but put three terms in "james", "domain.com"
: and "james@domain.com") .. but if you genuinely need flexible PrefixQuery
: support, you may want to look at the ConstantScorePrefixQuery in Solr ...
: there's nothing Solr specific about it, so you could drop it into your
: Lucene installation.  I'm not entirely sure how well the
: ConstantScoreQueries work with a MultiSearcher (mainly because i odn't
know
: how well Filter's work with MultiSearchers) but you could give it a try --
: it certainly won't have a TooManyClauses problem.
:
: :
: : -----Original Message-----
: : From: Rob Staveley (Tom) [mailto:rstaveley@seseit.com]
: : Sent: 15 June 2006 14:51
: : To: java-user@lucene.apache.org
: : Subject: BooleanQuery.TooManyClauses on MultiSearcher
: :
: : I've just added a 3rd index directory (i.e. 3rd IndexSearcher) to my
: : MultiSearcher and I'm getting BooleanQuery.TooManyClauses errors on
: queries
: : which were working happily on 2 indexes.
: :
: : Here's an example query, which hopefully you'll find self-explanatory
from
: : the XML structure.
: : --------8<--------
: : <composite-query analyzer='1'>
: : 	<group required="true" prohibited="false">
: : 		<group required="false" prohibited="false">
: : 			<prefix field="to" required="false"
: : prohibited="false">james</prefix>
: : 			<prefix field="cc" required="false"
: : prohibited="false">james</prefix>
: : 			<prefix field="smtp-rcptto" required="false"
: : prohibited="false">james</prefix>
: : 			<prefix field="from" required="false"
: : prohibited="false">james</prefix>
: : 			<prefix field="smtp-mailfrom" required="false"
: : prohibited="false">james</prefix>
: : 		</group>
: : 		<parse field="body" required="false"
: : prohibited="false">james</parse>
: : 		<parse field="subject" required="false"
: : prohibited="false">james</parse>
: : 	</group>
: : </composite-query>
: : --------8<--------
: :
: : Note that there isn't even a range in there.
: :
: : Do BooleanQueries not scale well across indexes?
: :
:
:
:
: -Hoss
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message