lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Carlson <>
Subject Re: [Bug 12137] New: - Can '*' or '?' symbol be used as the first character of a search?
Date Thu, 29 Aug 2002 15:54:39 GMT

Is the rationale why this is a "bad idea" mostly a performance 
argument? So if you don't have to search through every term in the 
index, then the results will return much faster -- right.

I understand the concern, but without some benchmark if the desired 
result is beneficial to a user then we might want to explore it more. 
Or should we just say that's it's a bad idea based on the inherent 
issues with the design?

I would like to have benchmarks for a few reasons
1) To be able to help resolve these kind of questions
2) provide people performance benchmarks when evaluating Lucene.

What would be a reasonable performance benchmark to test this against?

1) CPU speed - Pentium III/800Mhz+? Pentium 4/1.5GHz+? Ultrasparc 
2) Index size (# terms) - 100K,  500K, 1M, 2M
?What does the index store - the terms, the terms and data?
3) Query - single term, 5 terms (AND), 5 terms (OR), wildcard (END), 
wildcard (start), wildcard (Start and end)

Kelvin put out something a while ago on this.



On Wednesday, August 28, 2002, at 01:03 PM, Brian Goetz wrote:

> On Wed, Aug 28, 2002 at 07:52:01PM -0000, wrote:
>> Do get me wrong, I did read the Parser Syntax, and understand that:
>> "Note: You cannot use a * or ? symbol as the first character of a 
>> search."
>> However, It would have been nice for this feature.  I made the 
>> following
>> changes to QueryParser.jj, and it seems work fine.  I am not sure if 
>> there is
>> any side effect though.  Can someone verify this?
> I think this is a bad idea.
> First of all, the query parser is a CONVENIENCE, not the only way to
> build query objects.  If the query parser language is too restrictive,
> then build the query objects programmatically.  Its not that hard.
> There were reasons why the query language was designed this way.  If
> you think that's an error, first you need to lobby for your position
> to change the design, THEN we can think about changing the parser.
> Parser are tricky.  Small changes can have big, unexpected effects.
> Lets make sure we want to do this first (which I think we don't), and
> then we can look at the implementation.
> --
> To unsubscribe, e-mail:   
> <>
> For additional commands, e-mail: 
> <>

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message