lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Brown <>
Subject Re: Phrase search using quotes -- special Tokenizer
Date Mon, 04 Sep 2006 01:48:18 GMT

Thanks for your input.  I'm sure I could do as you suggest (and maybe that
will end up being my best option), but I had hoped to use a string for
creating the query object, particularly as some of my queries are a bit


Chris Hostetter wrote:
> I haven't really been following this thread, but it's gotten so long
> i got interested.
> from whta i can tell skimming the discussion so far, it seems like the
> biggest confusion is about the definition of a "phrase" and what analyzers
> do with "quote" characters and what the QueryParser does with "quote"
> charcters -- when ultimately you don't seem to really care about "phrases"
> in a textual searching sense; nor do you seem to care about any of the
> "features" of the QueryParser.
> it seems that what you care about is:
>  1) making documents, and adding a list of "text chunks" to those
>     documents (what you've been calling phrases)
>  2) you then want to be able to search for "almost-exact" matches on those
>     "text chunks" ... these matches should be "exactish" because you don't
>     want partial matches based on white spaces, or splitting on hyphens,
>     but they shouldn't be truely exact because you want some simple
>     normalization...
> : actually would like to "normalize" a phrase (spaces) or a hyphenated
> word or
> : an underscored word to the same value -- e.g. MS-WORD or ms_WORd or "MS
> : Word" --> ms_word.
> which case, you should:
>  a) write yourself an analyzer which does no "tokenizing" (ie: each input
>     Field value generates a single token) but does the normalization you
>     want.
>  b) use this Analyzer when you add the fields to your documents, even
>     though you don't want *real* tokenization, add make the field type
>     TOKENIZED so your analyzer gets used.
>  c) when you get some text input to serach on, pass it to the same
>     Analyzer, take the Token you get back and manualy construct a
>     TermQuery out of it for the neccessary field.
> ...that's it.  that's all she wrote -- don't even look in QueryParser's
> general direction, at all.
> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View this message in context:
Sent from the Lucene - Java Users forum at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message