lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergiu Gordea <gser...@ifit.uni-klu.ac.at>
Subject Re: BooleanQuery - Too Many Clases on date range.
Date Mon, 04 Oct 2004 18:35:59 GMT
Chris Fraschetti wrote:

>absoultely, limiting the user's query is no problem here. I've
>currently implemented the lucene javascript to catcha lot of user
>quries that could cause issues.. blank queries, ? or * at the
>beginning of query, etc etc... but I couldn't think of a way to
>prevent the user from doing a*  but not   comment*   wanting comments
>or commentary...  any suggestions would be warmly welcomed.
>
>  
>
One cheap solution is to ask the user to enter at least 3 alfa-numerical 
chars.
What do you say about that?

  All the best,

  Sergiu

>On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher
><vauchers@cirano.qc.ca> wrote:
>  
>
>>Ok, got it, got a small comment though.
>>
>>For large wildcard queries, please note that google does not support wild
>>cards. Search hell*, and there will be no correct matches with hello.
>>
>>Is there a reason why you wish to allow such large queries? We might
>>be able to find alternative ways of helping you out. No one will use a
>>query a*. If someone does, the results would be completely meaningless
>>(many false positives for a user). However a query like program* might be
>>interesting to a user.
>>
>>The problem with hacking term expansion is that the rules of this
>>expansion might be hard to define (as is maybe one should use the
>>first, the most frequent terms or the even the least frequent, depending
>>on your app).
>>
>>sv
>>
>>On Mon, 4 Oct 2004, Chris Fraschetti wrote:
>>
>>    
>>
>>>The date portion of my code works great now.. no problems there, so
>>>      
>>>
>>    
>>
>>>let me thank you now for your date filter solution... but my current
>>>problem is in regards to a stand alone....   a*     query giving me
>>>the too many clauses exception....
>>>
>>>
>>>On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
>>><vauchers@cirano.qc.ca> wrote:
>>>      
>>>
>>>>BTW, what's wrong with the DateFilter solution, I mentionned earlier?
>>>>
>>>>I've used it before (before lucene-1.4 though) without memory problems,
>>>>thus I always assumed that it avoided the allocation problems with prefix
>>>>queries.
>>>>
>>>>sv
>>>>
>>>>
>>>>
>>>>On Mon, 4 Oct 2004, Chris Fraschetti wrote:
>>>>
>>>>        
>>>>
>>>>>Surely some folks out there have used lucene on a large scale and have
>>>>>had to compensate for this somehow, any other solutions? Morus, thank
>>>>>you very more for your imput, and I am looking into your solution,
>>>>>just putting my feelers out there once more.
>>>>>
>>>>>The lucene API is very limited as to it's descriptions of it's
>>>>>components, short of digging into the code, is there a good doc
>>>>>somewhere out there that explains the workins of lucene?
>>>>>
>>>>>
>>>>>On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
>>>>><fraschetti@gmail.com> wrote:
>>>>>          
>>>>>
>>>>>>So before I spend a significant amount of time digging into the lucene
>>>>>>code, how does your experience with lucene give light to my
>>>>>>situation....  Our current index is pretty huge, and with each
>>>>>>increase in side i've had, i've experienced a problem like this...
>>>>>>Without taking up too much of your time.. because obviously this i
my
>>>>>>task, I thought i'd ask you if you'd had any experience with this
>>>>>>boolean clause nonsense...  of course it can be overcome, but if you
>>>>>>know a quick hack, awesome, otherwise.. no big, but off to work i
go
>>>>>>:)
>>>>>>
>>>>>>-Fraschetti
>>>>>>
>>>>>>
>>>>>>---------- Forwarded message ----------
>>>>>>From: Morus Walter <morus.walter@tanto.de>
>>>>>>Date: Mon, 4 Oct 2004 09:01:50 +0200
>>>>>>Subject: Re: BooleanQuery - Too Many Clases on date range.
>>>>>>To: Lucene Users List <lucene-user@jakarta.apache.org>, Chris
>>>>>>Fraschetti <fraschetti@gmail.com>
>>>>>>
>>>>>>Chris Fraschetti writes:
>>>>>>            
>>>>>>
>>>>>>>So i decicded to move my epoch date to the  20040608 date which
fixed
>>>>>>>my boolean query problem in regards to my current data size (approx
>>>>>>>600,000) ....
>>>>>>>
>>>>>>>but now as soon as I do a query like ...      a*
>>>>>>>I get the boolean error again. Google obviously can handle this
query,
>>>>>>>and I'm pretty sure lucene can handle it.. any ideas? With out
>>>>>>>without a date dange specified i still get the  TooManyClauses
error.
>>>>>>>              
>>>>>>>
>>>>>>            
>>>>>>
>>>>>>>I tired cranking the maxclauses up to Integer.MaxInt, but java
gave me
>>>>>>>a out of memory error. Is this b/c the boolean search tried to
>>>>>>>allocate that many clauses by default or because my query actually
>>>>>>>needed that many clauses?
>>>>>>>              
>>>>>>>
>>>>>>boolean search allocates clauses for all tokens having the prefix
or
>>>>>>matching the wildcard expression.
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Why does it work on small indexes but not
>>>>>>>large?
>>>>>>>              
>>>>>>>
>>>>>>Because there are fewer tokens starting with a.
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Is there any way to have the parser create as many clauses as
>>>>>>>it can and then search with what it has? w/o recompiling the source?
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>You need to create your own version of Wildcard- and Prefix-Query
>>>>>>that takes a maximum term number and ignores further clauses.
>>>>>>And you need a variant of the query parser that uses these queries.
>>>>>>
>>>>>>This can be done, even without recompiling lucene, but you will have
to
>>>>>>do some programming at the level of lucene queries.
>>>>>>Shouldn't be hard, since you can use the sources as a starting point.
>>>>>>
>>>>>>I guess this does not exist because the lucene developer decided to
prefer
>>>>>>a query error rather than uncomplete results.
>>>>>>
>>>>>>Morus
>>>>>>
>>>>>>
>>>>>>--
>>>>>>___________________________________________________
>>>>>>Chris Fraschetti, Student CompSci System Admin
>>>>>>University of San Francisco
>>>>>>e fraschetti@gmail.com | http://meteora.cs.usfca.edu
>>>>>>
>>>>>>            
>>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>>
>>>>        
>>>>
>>>
>>>
>>>      
>>>
>>    
>>
>
>
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message