lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Brown <...@us.ibm.com>
Subject Re: Phrase search using quotes -- special Tokenizer
Date Mon, 04 Sep 2006 01:48:18 GMT

Thanks for your input.  I'm sure I could do as you suggest (and maybe that
will end up being my best option), but I had hoped to use a string for
creating the query object, particularly as some of my queries are a bit
complex.

Thanks.


Chris Hostetter wrote:
> 
> 
> I haven't really been following this thread, but it's gotten so long
> i got interested.
> 
> from whta i can tell skimming the discussion so far, it seems like the
> biggest confusion is about the definition of a "phrase" and what analyzers
> do with "quote" characters and what the QueryParser does with "quote"
> charcters -- when ultimately you don't seem to really care about "phrases"
> in a textual searching sense; nor do you seem to care about any of the
> "features" of the QueryParser.
> 
> it seems that what you care about is:
> 
>  1) making documents, and adding a list of "text chunks" to those
>     documents (what you've been calling phrases)
>  2) you then want to be able to search for "almost-exact" matches on those
>     "text chunks" ... these matches should be "exactish" because you don't
>     want partial matches based on white spaces, or splitting on hyphens,
>     but they shouldn't be truely exact because you want some simple
>     normalization...
> 
> : actually would like to "normalize" a phrase (spaces) or a hyphenated
> word or
> : an underscored word to the same value -- e.g. MS-WORD or ms_WORd or "MS
> : Word" --> ms_word.
> 
> ...in which case, you should:
>  a) write yourself an analyzer which does no "tokenizing" (ie: each input
>     Field value generates a single token) but does the normalization you
>     want.
>  b) use this Analyzer when you add the fields to your documents, even
>     though you don't want *real* tokenization, add make the field type
>     TOKENIZED so your analyzer gets used.
>  c) when you get some text input to serach on, pass it to the same
>     Analyzer, take the Token you get back and manualy construct a
>     TermQuery out of it for the neccessary field.
> 
> ...that's it.  that's all she wrote -- don't even look in QueryParser's
> general direction, at all.
> 
> 
> 
> -Hoss
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6128827
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message