lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Lucene Not Throwing Matches Without Spaces
Date Tue, 17 Nov 2009 17:24:42 GMT
Solr's WordDelimiterFilter has an option splitOnCaseChange i think that
might work for your SaddamHussain example.

if you want to use Ted's first approach with lucene, you could try the
compounds package in Lucene's analysis contrib, and give it an english
wordlist.
(or create a very refined custom list of your own as he suggested).

On Tue, Nov 17, 2009 at 12:14 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> That is what is going on.
>
> To fix the problem you generally need to do a bit of statistics on your
> corpus to discover word pairs that appear both with and without a space.
> Once you have that, you have two approaches that will work.
>
> The first approach is to index your text in an ambiguous fashion.  Where
> your "mighty duck" text would have previously been indexed, as Simon says,
> as two terms ["mighty"@0, "duck"@1] with the pair lexicon, you would index
> the text as ["mighty duck"@0, "mighty"@0, "duck"@1].  At this point, either
> query will work.
>
> Another approach that is easier if you don't want to mess with the indexer
> and analyzer chain, is to do the same transformation at query time.  If the
> user types the query [mightyduck], you would rewrite this to be [mightyduck
> OR phrase(mighty duck)].  Similarly, if the user types [mighty duck], you
> would rewrite the query to be [mightyduck OR phrase(mighty duck) OR mighty
> OR duck].
>
> On Tue, Nov 17, 2009 at 8:09 AM, Simon Willnauer <
> simon.willnauer@googlemail.com> wrote:
>
> > Nishu,
> >
> > first you should send this question to java-users not to general :)
> > When you index a doc the the content "mighty duck" your TokenStream
> > most likely builds two tokens t1:"mighty" t2:"duck"
> > the same happens (most likely) when you search for "mighty duck" with
> > the QueryParser so the query will be a boolean TermQuery("mighty") OR
> > TermQuery("duck"). This will retrieve your document. If you search for
> > "mightyduck" the query will only have one boolean clause (actually
> > none, its just a term query) with TermQuery("mightyduck"). Lucene will
> > not find any matches as this term is not in the index.
> >
> > Hope that helps for understanding what is going on.
> >
> > simon
> >
> > On Tue, Nov 17, 2009 at 2:16 PM, Nishu Soni <nishu.soni@3i-infotech.com>
> > wrote:
> > >
> > > Lucene is not throwing matches when search string is without space and
> > data
> > > in my index file is with space.For e.g. if "Saddam Hussain" text is in
> > index
> > > file and I am searchin "SaddamHussain", I am not getting any matches.I
> am
> > > using Boolean Query for scanning.
> > >
> > > Any help will be highly appreciated.
> > > --
> > > View this message in context:
> >
> http://old.nabble.com/Lucene-Not-Throwing-Matches-Without-Spaces-tp26389750p26389750.html
> > > Sent from the Lucene - General mailing list archive at Nabble.com.
> > >
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>



-- 
Robert Muir
rcmuir@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message