lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Lucene Not Throwing Matches Without Spaces
Date Tue, 17 Nov 2009 17:14:25 GMT
That is what is going on.

To fix the problem you generally need to do a bit of statistics on your
corpus to discover word pairs that appear both with and without a space.
Once you have that, you have two approaches that will work.

The first approach is to index your text in an ambiguous fashion.  Where
your "mighty duck" text would have previously been indexed, as Simon says,
as two terms ["mighty"@0, "duck"@1] with the pair lexicon, you would index
the text as ["mighty duck"@0, "mighty"@0, "duck"@1].  At this point, either
query will work.

Another approach that is easier if you don't want to mess with the indexer
and analyzer chain, is to do the same transformation at query time.  If the
user types the query [mightyduck], you would rewrite this to be [mightyduck
OR phrase(mighty duck)].  Similarly, if the user types [mighty duck], you
would rewrite the query to be [mightyduck OR phrase(mighty duck) OR mighty
OR duck].

On Tue, Nov 17, 2009 at 8:09 AM, Simon Willnauer <
simon.willnauer@googlemail.com> wrote:

> Nishu,
>
> first you should send this question to java-users not to general :)
> When you index a doc the the content "mighty duck" your TokenStream
> most likely builds two tokens t1:"mighty" t2:"duck"
> the same happens (most likely) when you search for "mighty duck" with
> the QueryParser so the query will be a boolean TermQuery("mighty") OR
> TermQuery("duck"). This will retrieve your document. If you search for
> "mightyduck" the query will only have one boolean clause (actually
> none, its just a term query) with TermQuery("mightyduck"). Lucene will
> not find any matches as this term is not in the index.
>
> Hope that helps for understanding what is going on.
>
> simon
>
> On Tue, Nov 17, 2009 at 2:16 PM, Nishu Soni <nishu.soni@3i-infotech.com>
> wrote:
> >
> > Lucene is not throwing matches when search string is without space and
> data
> > in my index file is with space.For e.g. if "Saddam Hussain" text is in
> index
> > file and I am searchin "SaddamHussain", I am not getting any matches.I am
> > using Boolean Query for scanning.
> >
> > Any help will be highly appreciated.
> > --
> > View this message in context:
> http://old.nabble.com/Lucene-Not-Throwing-Matches-Without-Spaces-tp26389750p26389750.html
> > Sent from the Lucene - General mailing list archive at Nabble.com.
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message