lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: AW: AW: Search with multiple wildcards
Date Thu, 11 Sep 2008 13:57:28 GMT
>>That should give you the functionality you are looking for.

If I understand your suggestion correctly, It won't. The Highlighter uses a tokenized version
of the document text.

Simplistically it does the following psuedo code:

for all tokens in documentTokenStream,
   if(queryTermsSet.contains(token))
        output "<b>"+token+"</b">
   else
        output token

NOT

for all tokens in query string
     fullDocumentString.replaceAll(queryStringToken, "<b>"+queryStringToken+"</b>"

So in the given example while you suggest manipulating "ll" to be in the query string,  you
cannot make "ll" appear as a token in documentTokenStream.

Actually the Highlighter logic is a fair bit more involved than this (especially when using
SpanQueryScorer) but the basis of it is there in the above pseudo code.





----- Original Message ----
From: Matthew Hall <mhall@informatics.jax.org>
To: java-user@lucene.apache.org
Sent: Thursday, 11 September, 2008 14:40:26
Subject: Re: AW: AW: Search with multiple wildcards

Well, you could certainly manipulate your search string, removing the 
wildcard punctuations, and then use that for what you pass to the 
highlighter.

That should give you the functionality you are looking for.


-Matt
mark harwood wrote:
>>> Is this possible?
>>>      
>
> Not currently, the highlighter works with a list of words (or words AND phrases using
the new span support) and highlights those.
> To do anything else would require the higlighter to faithfully re-implement much of the
logic in all of the different query types (fuzzy, wildcard, regex etc etc) which is much more
challenging/difficult to maintain.
>
>
>
> ----- Original Message ----
> From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
> To: java-user@lucene.apache.org
> Sent: Thursday, 11 September, 2008 12:07:36
> Subject: AW: AW: Search with multiple wildcards
>
> Ok, one final question:
>
> If i query for "*ll*", the query is expanded to ("hallo" or "alle" or ...), so the
> Highligter will highlight the words "hallo" or "alle". But how can i highlight only
> the original query, so only the "ll"? Is this possible?
>
> Thanks a lot
> Mirko
>
> -----Ursprüngliche Nachricht-----
> Von: mark harwood [mailto:markharw00d@yahoo.co.uk] 
> Gesendet: Donnerstag, 11. September 2008 11:20
> An: java-user@lucene.apache.org
> Betreff: Re: AW: Search with multiple wildcards
>
> You need to call rewrite on the query to expand it then give that version to the highlighter
- see the package javadocs.
> http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/highlight/package-summary.html#package_description
>
>
> Cheers
> Mark
>
>
>
>
> ----- Original Message ----
> From: "Sertic Mirko, Bedag" <Mirko.Sertic@bedag.ch>
> To: java-user@lucene.apache.org
> Sent: Thursday, 11 September, 2008 9:34:13
> Subject: AW: Search with multiple wildcards
>
> Ok, i gave it a try, but i ran into this TooManyClauses Exception. I see that
> 3ildcard queries are expanded before they are processed, and I see that i can
> set the clauses count to Integer.MAXVALUE, and queries can consume a lot of memory, 
> but one final thing is still open: does a wildcard query work together with 
> the Lucene Highlighter? I tried it, but I only got an empty result. Without 
> wildcards, the highlighter works pretty smooth!
>
> Regards
> Mirko
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:erickerickson@gmail.com] 
> Gesendet: Mittwoch, 10. September 2008 18:15
> An: java-user@lucene.apache.org
> Betreff: Re: Search with multiple wildcards
>
> Of course you can construct your own BooleanQuery
> programmatically.
>
> It's relatively easy, just try it.
>
> On Wed, Sep 10, 2008 at 11:52 AM, Sertic Mirko, Bedag <Mirko.Sertic@bedag.ch
>  
>> wrote:
>>    
>
>  
>> Jep, this is what i have read.
>>
>> do I need to use the query parser, or can I create a query by the api?
>> Is there an example available?
>>
>> Thanks a lot
>> Mirko
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Erick Erickson [mailto:erickerickson@gmail.com]
>> Gesendet: Mittwoch, 10. September 2008 16:45
>> An: java-user@lucene.apache.org
>> Betreff: Re: Search with multiple wildcards
>>
>> Is this what you're referring to?
>>
>> Lucene supports single and multiple character wildcard searches within
>> single terms (not within phrase queries).
>> (from http://lucene.apache.org/java/docs/queryparsersyntax.html)
>>
>> I'm pretty sure you can have multiple *terms* with wildcards. Luke is your
>> friend here, download a copy and try it <G>. Be sure on the search tab to
>> specify StandardAnalyzer or some such, rather than keywordanalyzer.
>>
>> The phrase is trying to point out that a phrase query does NOT respect
>> wildcards. That is, submitting
>> "ab* bc* cd*" AS A PHRASE QUERY won't do what you expect. But I'm pretty
>> sure that
>>
>> +field:ab* +field:bc* +field:cd*
>>
>> will work just fine. The key here is "within single terms", which I think
>> of
>> as
>> "within a single term query". You can add as many TermQuerys as you want.
>> See the query documentation for how to submit phrase queries.
>>
>> Best
>> Erick
>>
>> On Wed, Sep 10, 2008 at 10:11 AM, Sertic Mirko, Bedag
>> <Mirko.Sertic@bedag.ch
>>    
>>> wrote:
>>>      
>>> Hi
>>>
>>> Thank you for your quick response:-)
>>>
>>> Of course I need to use the * character :-) But I have read somewhere in
>>> the documentation that leading wildcards are not supported, and only one
>>> wildcard term per query. Is this limitation resolved in the current
>>>      
>> version?
>>    
>>> Regards
>>> Mirko
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Erick Erickson [mailto:erickerickson@gmail.com]
>>> Gesendet: Mittwoch, 10. September 2008 15:47
>>> An: java-user@lucene.apache.org
>>> Betreff: Re: Search with multiple wildcards
>>>
>>> Sure, but you'll have to set the leading wildcard option,
>>> which I've forgotten the exact call, but it's in the docs.
>>>
>>> And use * rather than % <G>.
>>>
>>> But wildcards are tricky, especially the TooManyClauses
>>> exception. You might want to peruse the archive for wildcard
>>> posts...
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Sep 10, 2008 at 9:06 AM, Sertic Mirko, Bedag
>>> <Mirko.Sertic@bedag.ch>wrote:
>>>
>>>      
>>>> Hi@all
>>>>
>>>>
>>>>
>>>> Is it possible to do a search with multiple wildcards in one query, for
>>>> instance "%MANAGE%" AND "CORE%"? Is there a code example available?
>>>>
>>>>
>>>>
>>>> Thanks a lot
>>>>
>>>> Mirko
>>>>
>>>>
>>>>
>>>>
>>>>        
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>      
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>    
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>      
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>      
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>  



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message