lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Hall <mh...@informatics.jax.org>
Subject Re: AW: AW: Search with multiple wildcards
Date Thu, 11 Sep 2008 14:04:49 GMT
Ah.. that's a darn good point..

Though, that second bit of code you have there could be used at display 
time for him to get the functionality that he wants.  You could also 
modify it somewhat, and apply it against the displayable part of the hit 
he's getting back rather than the individual tokens.

This if of course only assuming that this functionality would be used 
after a contains type search was detected.

Considering that's the only real usecase for a technique like this, I'm 
thinking its probably more trouble than its worth for his case.



mark harwood wrote:
>>> That should give you the functionality you are looking for.
>>>       
>
> If I understand your suggestion correctly, It won't. The Highlighter uses a tokenized
version of the document text.
>
> Simplistically it does the following psuedo code:
>
> for all tokens in documentTokenStream,
>    if(queryTermsSet.contains(token))
>         output "<b>"+token+"</b">
>    else
>         output token
>
> NOT
>
> for all tokens in query string
>      fullDocumentString.replaceAll(queryStringToken, "<b>"+queryStringToken+"</b>"
>
> So in the given example while you suggest manipulating "ll" to be in the query string,
 you cannot make "ll" appear as a token in documentTokenStream.
>
> Actually the Highlighter logic is a fair bit more involved than this (especially when
using SpanQueryScorer) but the basis of it is there in the above pseudo code.
>
>
>
>
>
> ----- Original Message ----
> From: Matthew Hall <mhall@informatics.jax.org>
> To: java-user@lucene.apache.org
> Sent: Thursday, 11 September, 2008 14:40:26
> Subject: Re: AW: AW: Search with multiple wildcards
>
> Well, you could certainly manipulate your search string, removing the 
> wildcard punctuations, and then use that for what you pass to the 
> highlighter.
>
> That should give you the functionality you are looking for.
>
>
> -Matt
> mark harwood wrote:
>   
>>>> Is this possible?
>>>>      
>>>>         
>> Not currently, the highlighter works with a list of words (or words AND phrases using
the new span support) and highlights those.
>> To do anything else would require the higlighter to faithfully re-implement much
of the logic in all of the different query types (fuzzy, wildcard, regex etc etc) which is
much more challenging/difficult to maintain.
>>
>>
>>
>> ----- Original Message ----
>> From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
>> To: java-user@lucene.apache.org
>> Sent: Thursday, 11 September, 2008 12:07:36
>> Subject: AW: AW: Search with multiple wildcards
>>
>> Ok, one final question:
>>
>> If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
>> Highligter will highlight the words "hallo" or "alle". But how can i highlight only
>> the original query, so only the "ll"? Is this possible?
>>
>> Thanks a lot
>> Mirko
>>
>> -----Ursprüngliche Nachricht-----
>> Von: mark harwood [mailto:markharw00d@yahoo.co.uk] 
>> Gesendet: Donnerstag, 11. September 2008 11:20
>> An: java-user@lucene.apache.org
>> Betreff: Re: AW: Search with multiple wildcards
>>
>> You need to call rewrite on the query to expand it then give that version to the
highlighter - see the package javadocs.
>> http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description
>>
>>
>> Cheers
>> Mark
>>
>>
>>
>>
>> ----- Original Message ----
>> From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
>> To: java-user@lucene.apache.org
>> Sent: Thursday, 11 September, 2008 9:34:13
>> Subject: AW: Search with multiple wildcards
>>
>> Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
>> 3ildcard queries are expanded before they are processed, and I see that i can
>> set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory,

>> but one final thing is still open: does a wildcard query work together with 
>> the Lucene Highlighter? I tried it, but I only got an empty result. Without 
>> wildcards, the highlighter works pretty smooth!
>>
>> Regards
>> Mirko
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Erick Erickson [mailto:erickerickson@gmail.com] 
>> Gesendet: Mittwoch, 10. September 2008 18:15
>> An: java-user@lucene.apache.org
>> Betreff: Re: Search with multiple wildcards
>>
>> Of course you can construct your own BooleanQuery
>> programmatically.
>>
>> It's relatively easy, just try it.
>>
>> On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <Mirko.Sertic@bedag.ch
>>  
>>     
>>> wrote:
>>>    
>>>       
>>  
>>     
>>> Jep, this is what i have read.
>>>
>>> do I need to use the query parser, or can I create a query by the api?
>>> Is there an example available?
>>>
>>> Thanks a lot
>>> Mirko
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Erick Erickson [mailto:erickerickson@gmail.com]
>>> Gesendet: Mittwoch, 10. September 2008 16:45
>>> An: java-user@lucene.apache.org
>>> Betreff: Re: Search with multiple wildcards
>>>
>>> Is this what you're referring to?
>>>
>>> Lucene supports single and multiple character wildcard searches within
>>> single terms (not within phrase queries).
>>> (from http://lucene.apache.org/java/docs/queryparsersyntax.html)
>>>
>>> I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
>>> friend here, download a copy and try it <G>. Be sure on the search tab
to
>>> specify StandardAnalyzer or some such, rather than keywordanalyzer.
>>>
>>> The phrase is trying to point out that a phrase query does NOT respect
>>> wildcards. That is, submitting
>>> "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
>>> sure that
>>>
>>> +field:ab* +field:bc* +field:cd*
>>>
>>> will work just fine. The key here is "within single terms", which I think
>>> of
>>> as
>>> "within a single term query". You can add as many TermQuerys as you want.
>>> See the query documentation for how to submit phrase queries.
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
>>> <Mirko.Sertic@bedag.ch
>>>    
>>>       
>>>> wrote:
>>>>      
>>>> Hi
>>>>
>>>> Thank you for your quick response:-)
>>>>
>>>> Of course I need to use the * character :-) But I have read somewhere in
>>>> the documentation that leading wildcards are not supported, and only one
>>>> wildcard term per query. Is this limitation resolved in the current
>>>>      
>>>>         
>>> version?
>>>    
>>>       
>>>> Regards
>>>> Mirko
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Erick Erickson [mailto:erickerickson@gmail.com]
>>>> Gesendet: Mittwoch, 10. September 2008 15:47
>>>> An: java-user@lucene.apache.org
>>>> Betreff: Re: Search with multiple wildcards
>>>>
>>>> Sure, but you'll have to set the leading wildcard option,
>>>> which I've forgotten the exact call, but it's in the docs.
>>>>
>>>> And use * rather than % <G>.
>>>>
>>>> But wildcards are tricky, especially the TooManyClauses
>>>> exception. You might want to peruse the archive for wildcard
>>>> posts...
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
>>>> <Mirko.Sertic@bedag.ch>wrote:
>>>>
>>>>      
>>>>         
>>>>> Hi@all
>>>>>
>>>>>
>>>>>
>>>>> Is it possible to do a search with multiple wildcards in one query, for
>>>>> instance "%MANAGE%" AND "CORE%"? Is there a code example available?
>>>>>
>>>>>
>>>>>
>>>>> Thanks a lot
>>>>>
>>>>> Mirko
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>        
>>>>>           
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>      
>>>>         
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>    
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>      
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>      
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>  
>>     
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>       
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message