lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: literal operator?
Date Sun, 23 Feb 2003 19:12:47 GMT
On Sunday 23 February 2003 08:56, Matthew King wrote:
> I'd thought this went into the black hole of feature requests never to
> return. ;)
> I also agree that the single quote is probably a bad choice for an
> operator.  In my code i'm actually using "#lit(<term>)" to make things
> as unambiguous as possible.  (but this doesn't really follow the style
> of other Lucene query syntax operators)

I too think that while for programmers single quote might be ok (it's 
consistent with behaviour of many unix shells), end users would probably 
expect single and double quotes to either work the same, or single
quote to be taken as literal (to be able to search "foobar's", in 
non-tokenized field?).

One alternative would be using some other non-alphanum character either as 
prefix or suffix. First thing I can think of is using '=' suffix for exact 
match, so something like:

  ="longer non-tokenizer phrase"

and then either

depending on what makes most sense for users (most intuitive), or that's 
easiest to implement.

This is of course assuming = isn't already used for something else? (I don't 
think it is but perhaps I missed something).

> And the reason I didn't use getFieldQuery is because it is using the
> analyzer to tokenize and would cause me to loose the raw terms, no?
> Maybe i'm not understanding the code here?
> One thing to keep in mind is that literal queries will only work with
> Keyword fields.  Literal searches will not work on fields that have
> been stemmed at indexing time.  Perhaps the query parser could be made
> smart enough to do what the user wants here without them having to ask?
>   Do we know at query time what options a particular field was indexed
> with?

If that can be determined, it'd be good to do that... it makes no sense to use 
analyzer when searching field that's not tokenized.

But even if that can not be determined, it should be easy to implement this 
feature on derived class, for individual apps. App knows which fields are not 
tokenized, and can override getFieldQuery() to handle these fields different 
from default implementation.

As usual, it would be nice to have some documentation that explains how to do 
it. Perhaps FAQ is not the right place... it would be nice to have "best 
practices" page that would contain hints, ideas and suggestions of how things 
can be customized, how default functionality can be overridden, and when/why 
it's usually done.

-+ Tatu +-

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message