lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Byrne <john.by...@propylon.com>
Subject Re: Wild carded phrases
Date Fri, 09 May 2008 09:26:15 GMT
Hi,

Here's a searchable mailing list archive: 
http://www.gossamer-threads.com/lists/lucene/java-user/

As regards the wildcard phrase queries, here's one way I think you could 
do it, but it's a bit of extra work. If you're using QueryParser, you'd 
have to override the "getFieldQuery" method to use span queries instead 
of phrase queries.

A phrase query can be implemented as a span query with a span or "slop"  
factor of 1. So, once you have the PhraseQuery object, you would:

1. Extract the terms
2. For each one, check if it contains a "*" or a "?"
3. If it does, create a WildcardQuery using that term, and re-write it 
using IndexReader.rewrite method. This expands the wildcard query into 
all it's matches.
4. Create an array of SpanTermQuery objects, (one SpanTermQuery for each 
term that matched you wildcard); then add that array to a SpanOrQuery.
5. Repeat 2 to 4 for each wildcard term in the phrase.
6. Finally (!), create a SpanNearQuery, adding all the original terms in 
order, but substituting your SpanOrQuerys for the wildcard terms. Use a 
slop of 1, and set the inOrder flag to true.

So, essentially, you'd end up with: (you'll have to excuse me if I 
haven't rendered the span queries correctly as strings here - but this 
should give the general idea...)

spanNear[boiler (spanOr[replacement replacing])]

So it will accept *either* "replacement" or "replacing" adjacent to 
"boiler", which is what you want.

As you can see, it's a bit of work - but if you add this functionality 
to the QueryParser, you'll can re-use it a lot!

Hope that helps!

-JB








Jon Loken wrote:
> Hi all, 
>
> First of all, well done to the implementers of Lucene. The performance
> is incredible! We get search results within 20-40 ms on an index about
> 1.5GB. 
>
> I could not find a Lucene maillist search engine, something I am a bit
> surprised about!
>
> My question is how I can implement wild carded phrases searches like:
> "boiler replac*"
> This will pick up text:
> "boiler replacement" and "boiler replacing"
> But not:
> "boiling replacement" or "boiler user replacment"
>
> I am using the queryParser through Spring-lucene-module. 
>
>
>
> I did simply try textToSearch= "boiler replac*", but this did not work
> as anticipated. Have not analyzed properly, but it seemed to interpret
> this as: 
> boiler OR replac*
>
>
> Is there a way to implements this?
>
> Many thanks, 
> Jon
>
>
> BiP Solutions Limited is a company registered in Scotland with Company Number SC086146
and VAT number 38303966 and having its registered office at Park House, 300 Glasgow Road,
Shawfield, Glasgow, G73 1SQ ****************************************************************************
> This e-mail (and any attachment) is intended only for the attention of the addressee(s).
Its unauthorised use, disclosure, storage or copying is not permitted. If you are not the
intended recipient, please destroyall copies and inform the sender by return e-mail.
> This e-mail (whether you are the sender or the recipient) may be monitored, recorded
and retained by BiP Solutions Ltd.
> E-mail monitoring/ blocking software may be used, and e-mail content may be read at any
time. You have a responsibility to ensure laws are not broken when composing or forwarding
e-mails and their contents.
> ****************************************************************************
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message