lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Brown <...@us.ibm.com>
Subject Re: Phrase search using quotes -- special Tokenizer
Date Mon, 04 Sep 2006 12:35:43 GMT

Yeah, they are more complex than the "exactish" match -- basically, there are
more fields involved -- combined sometimes with AND and sometimes with OR,
and sometimes negated field values, sometimes groupings, etc.  These other
field values are all single words (no spaces), and a search might involve a
wildcard on them.  Hope that helps.

Thanks.


Chris Hostetter wrote:
> 
> 
> : Thanks for your input.  I'm sure I could do as you suggest (and maybe
> that
> : will end up being my best option), but I had hoped to use a string for
> : creating the query object, particularly as some of my queries are a bit
> : complex.
> 
> you have to clarify what you mean by "use a string for creating the query
> object" ... there's nothing in what i suggested that implies you can't do
> that, that's exactly what i'm suggesting you do...
> 
>    String input = ...;
>    Analyzer a = new YourCustomAnalyzer();
>    // because you know your analyzer allways produces exactly one token...
>    Token t = a.tokenStream("yourField", new StringReader(input)).next();
>    Query yourQuery = new TermQuery("yourField", t.termText());
> 
> ...if your queries are more complex then just the "exactish" matching you
> described before, then that's a seperate issue -- what you described
> didn't sound like it required any special input processing -- you said you
> had a "string" and you wanted to find exact matches on that string (with
> some normalization) ... but that you didn't want your input split on
> whitespace, or hyphens, or any of the "special" characters QueryParser
> uses.
> 
> If you want other things then that certainly makes things more
> complicated, but the basic idea is still the same ... so what exactly do
> you mean when you say it's more complicated?
> 
> 
> : > I haven't really been following this thread, but it's gotten so long
> : > i got interested.
> : >
> : > from whta i can tell skimming the discussion so far, it seems like the
> : > biggest confusion is about the definition of a "phrase" and what
> analyzers
> : > do with "quote" characters and what the QueryParser does with "quote"
> : > charcters -- when ultimately you don't seem to really care about
> "phrases"
> : > in a textual searching sense; nor do you seem to care about any of the
> : > "features" of the QueryParser.
> : >
> : > it seems that what you care about is:
> : >
> : >  1) making documents, and adding a list of "text chunks" to those
> : >     documents (what you've been calling phrases)
> : >  2) you then want to be able to search for "almost-exact" matches on
> those
> : >     "text chunks" ... these matches should be "exactish" because you
> don't
> : >     want partial matches based on white spaces, or splitting on
> hyphens,
> : >     but they shouldn't be truely exact because you want some simple
> : >     normalization...
> : >
> : > : actually would like to "normalize" a phrase (spaces) or a hyphenated
> : > word or
> : > : an underscored word to the same value -- e.g. MS-WORD or ms_WORd or
> "MS
> : > : Word" --> ms_word.
> : >
> : > ...in which case, you should:
> : >  a) write yourself an analyzer which does no "tokenizing" (ie: each
> input
> : >     Field value generates a single token) but does the normalization
> you
> : >     want.
> : >  b) use this Analyzer when you add the fields to your documents, even
> : >     though you don't want *real* tokenization, add make the field type
> : >     TOKENIZED so your analyzer gets used.
> : >  c) when you get some text input to serach on, pass it to the same
> : >     Analyzer, take the Token you get back and manualy construct a
> : >     TermQuery out of it for the neccessary field.
> : >
> : > ...that's it.  that's all she wrote -- don't even look in
> QueryParser's
> : > general direction, at all.
> : >
> : >
> : >
> : > -Hoss
> : >
> : >
> : > ---------------------------------------------------------------------
> : > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : > For additional commands, e-mail: java-user-help@lucene.apache.org
> : >
> : >
> : >
> :
> : --
> : View this message in context:
> http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6128827
> : Sent from the Lucene - Java Users forum at Nabble.com.
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : For additional commands, e-mail: java-user-help@lucene.apache.org
> :
> 
> 
> 
> -Hoss
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6134864
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message