lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane James Vaucher <vauch...@cirano.qc.ca>
Subject Re: BooleanQuery - Too Many Clases on date range.
Date Mon, 04 Oct 2004 18:26:13 GMT
I've used the simple message that the user's request was too vague and
that he should modify it. I haven't had too many complaints about this
especially when I explained why to a client:

If one user of many does a*, the whole system will grind to a halt as that
one request will use up all of the available memory (wildcards aren't very
scalable...).

Here is an example of a working system:
http://theserverside.com/search/search.tss

I don't know if many people complain that when they do a*, that no results
appear, but a request for javap* returns javapro, javaplus, javapolis...

HTH,
sv

On Mon, 4 Oct 2004, Chris Fraschetti wrote:

> absoultely, limiting the user's query is no problem here. I've
> currently implemented the lucene javascript to catcha lot of user
> quries that could cause issues.. blank queries, ? or * at the
> beginning of query, etc etc... but I couldn't think of a way to
> prevent the user from doing a*  but not   comment*   wanting comments
> or commentary...  any suggestions would be warmly welcomed.
>
>
> On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher
> <vauchers@cirano.qc.ca> wrote:
> > Ok, got it, got a small comment though.
> >
> > For large wildcard queries, please note that google does not support wild
> > cards. Search hell*, and there will be no correct matches with hello.
> >
> > Is there a reason why you wish to allow such large queries? We might
> > be able to find alternative ways of helping you out. No one will use a
> > query a*. If someone does, the results would be completely meaningless
> > (many false positives for a user). However a query like program* might be
> > interesting to a user.
> >
> > The problem with hacking term expansion is that the rules of this
> > expansion might be hard to define (as is maybe one should use the
> > first, the most frequent terms or the even the least frequent, depending
> > on your app).
> >
> > sv
> >
> > On Mon, 4 Oct 2004, Chris Fraschetti wrote:
> >
> > > The date portion of my code works great now.. no problems there, so
> >
> >
> > > let me thank you now for your date filter solution... but my current
> > > problem is in regards to a stand alone....   a*     query giving me
> > > the too many clauses exception....
> > >
> > >
> > > On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
> > > <vauchers@cirano.qc.ca> wrote:
> > > > BTW, what's wrong with the DateFilter solution, I mentionned earlier?
> > > >
> > > > I've used it before (before lucene-1.4 though) without memory problems,
> > > > thus I always assumed that it avoided the allocation problems with prefix
> > > > queries.
> > > >
> > > > sv
> > > >
> > > >
> > > >
> > > > On Mon, 4 Oct 2004, Chris Fraschetti wrote:
> > > >
> > > > > Surely some folks out there have used lucene on a large scale and
have
> > > > > had to compensate for this somehow, any other solutions? Morus, thank
> > > > > you very more for your imput, and I am looking into your solution,
> > > > > just putting my feelers out there once more.
> > > > >
> > > > > The lucene API is very limited as to it's descriptions of it's
> > > > > components, short of digging into the code, is there a good doc
> > > > > somewhere out there that explains the workins of lucene?
> > > > >
> > > > >
> > > > > On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
> > > > > <fraschetti@gmail.com> wrote:
> > > > > > So before I spend a significant amount of time digging into
the lucene
> > > > > > code, how does your experience with lucene give light to my
> > > > > > situation....  Our current index is pretty huge, and with each
> > > > > > increase in side i've had, i've experienced a problem like this...
> > > > > > Without taking up too much of your time.. because obviously
this i my
> > > > > > task, I thought i'd ask you if you'd had any experience with
this
> > > > > > boolean clause nonsense...  of course it can be overcome, but
if you
> > > > > > know a quick hack, awesome, otherwise.. no big, but off to work
i go
> > > > > > :)
> > > > > >
> > > > > > -Fraschetti
> > > > > >
> > > > > >
> > > > > > ---------- Forwarded message ----------
> > > > > > From: Morus Walter <morus.walter@tanto.de>
> > > > > > Date: Mon, 4 Oct 2004 09:01:50 +0200
> > > > > > Subject: Re: BooleanQuery - Too Many Clases on date range.
> > > > > > To: Lucene Users List <lucene-user@jakarta.apache.org>,
Chris
> > > > > > Fraschetti <fraschetti@gmail.com>
> > > > > >
> > > > > > Chris Fraschetti writes:
> > > > > > > So i decicded to move my epoch date to the  20040608 date
which fixed
> > > > > > > my boolean query problem in regards to my current data
size (approx
> > > > > > > 600,000) ....
> > > > > > >
> > > > > > > but now as soon as I do a query like ...      a*
> > > > > > > I get the boolean error again. Google obviously can handle
this query,
> > > > > > > and I'm pretty sure lucene can handle it.. any ideas? With
out
> > > > > > > without a date dange specified i still get the  TooManyClauses
error.
> > > > > >
> > > > > >
> > > > > > > I tired cranking the maxclauses up to Integer.MaxInt, but
java gave me
> > > > > > > a out of memory error. Is this b/c the boolean search tried
to
> > > > > > > allocate that many clauses by default or because my query
actually
> > > > > > > needed that many clauses?
> > > > > >
> > > > > > boolean search allocates clauses for all tokens having the prefix
or
> > > > > > matching the wildcard expression.
> > > > > >
> > > > > > > Why does it work on small indexes but not
> > > > > > > large?
> > > > > > Because there are fewer tokens starting with a.
> > > > > >
> > > > > > > Is there any way to have the parser create as many clauses
as
> > > > > > > it can and then search with what it has? w/o recompiling
the source?
> > > > > > >
> > > > > > You need to create your own version of Wildcard- and Prefix-Query
> > > > > > that takes a maximum term number and ignores further clauses.
> > > > > > And you need a variant of the query parser that uses these queries.
> > > > > >
> > > > > > This can be done, even without recompiling lucene, but you will
have to
> > > > > > do some programming at the level of lucene queries.
> > > > > > Shouldn't be hard, since you can use the sources as a starting
point.
> > > > > >
> > > > > > I guess this does not exist because the lucene developer decided
to prefer
> > > > > > a query error rather than uncomplete results.
> > > > > >
> > > > > > Morus
> > > > > >
> > > > > >
> > > > > > --
> > > > > > ___________________________________________________
> > > > > > Chris Fraschetti, Student CompSci System Admin
> > > > > > University of San Francisco
> > > > > > e fraschetti@gmail.com | http://meteora.cs.usfca.edu
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > > >
> > > >
> > >
> > >
> > >
> > >
> >
> >
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message