Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 93908 invoked from network); 4 Oct 2004 18:13:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 4 Oct 2004 18:13:00 -0000 Received: (qmail 15498 invoked by uid 500); 4 Oct 2004 18:12:48 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 15462 invoked by uid 500); 4 Oct 2004 18:12:47 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 15446 invoked by uid 99); 4 Oct 2004 18:12:47 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of fraschetti@gmail.com designates 64.233.170.194 as permitted sender) Received: from [64.233.170.194] (HELO mproxy.gmail.com) (64.233.170.194) by apache.org (qpsmtpd/0.28) with ESMTP; Mon, 04 Oct 2004 11:12:45 -0700 Received: by mproxy.gmail.com with SMTP id 77so1543791rnl for ; Mon, 04 Oct 2004 11:12:36 -0700 (PDT) Received: by 10.38.171.77 with SMTP id t77mr6249958rne; Mon, 04 Oct 2004 11:12:32 -0700 (PDT) Received: by 10.38.171.15 with HTTP; Mon, 4 Oct 2004 11:12:31 -0700 (PDT) Message-ID: Date: Mon, 4 Oct 2004 11:12:31 -0700 From: Chris Fraschetti Reply-To: Chris Fraschetti To: Lucene Users List Subject: Re: BooleanQuery - Too Many Clases on date range. In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit References: X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N absoultely, limiting the user's query is no problem here. I've currently implemented the lucene javascript to catcha lot of user quries that could cause issues.. blank queries, ? or * at the beginning of query, etc etc... but I couldn't think of a way to prevent the user from doing a* but not comment* wanting comments or commentary... any suggestions would be warmly welcomed. On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher wrote: > Ok, got it, got a small comment though. > > For large wildcard queries, please note that google does not support wild > cards. Search hell*, and there will be no correct matches with hello. > > Is there a reason why you wish to allow such large queries? We might > be able to find alternative ways of helping you out. No one will use a > query a*. If someone does, the results would be completely meaningless > (many false positives for a user). However a query like program* might be > interesting to a user. > > The problem with hacking term expansion is that the rules of this > expansion might be hard to define (as is maybe one should use the > first, the most frequent terms or the even the least frequent, depending > on your app). > > sv > > On Mon, 4 Oct 2004, Chris Fraschetti wrote: > > > The date portion of my code works great now.. no problems there, so > > > > let me thank you now for your date filter solution... but my current > > problem is in regards to a stand alone.... a* query giving me > > the too many clauses exception.... > > > > > > On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher > > wrote: > > > BTW, what's wrong with the DateFilter solution, I mentionned earlier? > > > > > > I've used it before (before lucene-1.4 though) without memory problems, > > > thus I always assumed that it avoided the allocation problems with prefix > > > queries. > > > > > > sv > > > > > > > > > > > > On Mon, 4 Oct 2004, Chris Fraschetti wrote: > > > > > > > Surely some folks out there have used lucene on a large scale and have > > > > had to compensate for this somehow, any other solutions? Morus, thank > > > > you very more for your imput, and I am looking into your solution, > > > > just putting my feelers out there once more. > > > > > > > > The lucene API is very limited as to it's descriptions of it's > > > > components, short of digging into the code, is there a good doc > > > > somewhere out there that explains the workins of lucene? > > > > > > > > > > > > On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti > > > > wrote: > > > > > So before I spend a significant amount of time digging into the lucene > > > > > code, how does your experience with lucene give light to my > > > > > situation.... Our current index is pretty huge, and with each > > > > > increase in side i've had, i've experienced a problem like this... > > > > > Without taking up too much of your time.. because obviously this i my > > > > > task, I thought i'd ask you if you'd had any experience with this > > > > > boolean clause nonsense... of course it can be overcome, but if you > > > > > know a quick hack, awesome, otherwise.. no big, but off to work i go > > > > > :) > > > > > > > > > > -Fraschetti > > > > > > > > > > > > > > > ---------- Forwarded message ---------- > > > > > From: Morus Walter > > > > > Date: Mon, 4 Oct 2004 09:01:50 +0200 > > > > > Subject: Re: BooleanQuery - Too Many Clases on date range. > > > > > To: Lucene Users List , Chris > > > > > Fraschetti > > > > > > > > > > Chris Fraschetti writes: > > > > > > So i decicded to move my epoch date to the 20040608 date which fixed > > > > > > my boolean query problem in regards to my current data size (approx > > > > > > 600,000) .... > > > > > > > > > > > > but now as soon as I do a query like ... a* > > > > > > I get the boolean error again. Google obviously can handle this query, > > > > > > and I'm pretty sure lucene can handle it.. any ideas? With out > > > > > > without a date dange specified i still get the TooManyClauses error. > > > > > > > > > > > > > > > > I tired cranking the maxclauses up to Integer.MaxInt, but java gave me > > > > > > a out of memory error. Is this b/c the boolean search tried to > > > > > > allocate that many clauses by default or because my query actually > > > > > > needed that many clauses? > > > > > > > > > > boolean search allocates clauses for all tokens having the prefix or > > > > > matching the wildcard expression. > > > > > > > > > > > Why does it work on small indexes but not > > > > > > large? > > > > > Because there are fewer tokens starting with a. > > > > > > > > > > > Is there any way to have the parser create as many clauses as > > > > > > it can and then search with what it has? w/o recompiling the source? > > > > > > > > > > > You need to create your own version of Wildcard- and Prefix-Query > > > > > that takes a maximum term number and ignores further clauses. > > > > > And you need a variant of the query parser that uses these queries. > > > > > > > > > > This can be done, even without recompiling lucene, but you will have to > > > > > do some programming at the level of lucene queries. > > > > > Shouldn't be hard, since you can use the sources as a starting point. > > > > > > > > > > I guess this does not exist because the lucene developer decided to prefer > > > > > a query error rather than uncomplete results. > > > > > > > > > > Morus > > > > > > > > > > > > > > > -- > > > > > ___________________________________________________ > > > > > Chris Fraschetti, Student CompSci System Admin > > > > > University of San Francisco > > > > > e fraschetti@gmail.com | http://meteora.cs.usfca.edu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > > > > > > > > > > > > > -- ___________________________________________________ Chris Fraschetti, Student CompSci System Admin University of San Francisco e fraschetti@gmail.com | http://meteora.cs.usfca.edu --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org