lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Phrase search using quotes -- special Tokenizer
Date Mon, 04 Sep 2006 00:26:55 GMT
Yeah, what he said <G>....

On 9/3/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
>
>
> I haven't really been following this thread, but it's gotten so long
> i got interested.
>
> from whta i can tell skimming the discussion so far, it seems like the
> biggest confusion is about the definition of a "phrase" and what analyzers
> do with "quote" characters and what the QueryParser does with "quote"
> charcters -- when ultimately you don't seem to really care about "phrases"
> in a textual searching sense; nor do you seem to care about any of the
> "features" of the QueryParser.
>
> it seems that what you care about is:
>
> 1) making documents, and adding a list of "text chunks" to those
>     documents (what you've been calling phrases)
> 2) you then want to be able to search for "almost-exact" matches on those
>     "text chunks" ... these matches should be "exactish" because you don't
>     want partial matches based on white spaces, or splitting on hyphens,
>     but they shouldn't be truely exact because you want some simple
>     normalization...
>
> : actually would like to "normalize" a phrase (spaces) or a hyphenated
> word or
> : an underscored word to the same value -- e.g. MS-WORD or ms_WORd or "MS
> : Word" --> ms_word.
>
> ...in which case, you should:
> a) write yourself an analyzer which does no "tokenizing" (ie: each input
>     Field value generates a single token) but does the normalization you
>     want.
> b) use this Analyzer when you add the fields to your documents, even
>     though you don't want *real* tokenization, add make the field type
>     TOKENIZED so your analyzer gets used.
> c) when you get some text input to serach on, pass it to the same
>     Analyzer, take the Token you get back and manualy construct a
>     TermQuery out of it for the neccessary field.
>
> ...that's it.  that's all she wrote -- don't even look in QueryParser's
> general direction, at all.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message