Mailing-List: contact general-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of rcmuir@gmail.com designates
 209.85.216.186 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=X8rze0OGtfT+iJ7APBpCUS2mtKZW3A/Msmi7URGETJ2acHh29aF++LMZHrYCfupKxm
         VRXb/5xa300f0KUsl/b7fM1afYCBqJSyhuz/fCC2aTzCX30gWW3mvdefNZzIm9z0VVGY
         Jly9EhA5SShZCtQM4XC1kHy/Uf7Y3i18VnWWg=
MIME-Version: 1.0
In-Reply-To: <c7d45fc70911170914r66f57d99qd0f4f7b56ff8ccf8@mail.gmail.com>
References: <26389750.post@talk.nabble.com>
 <f18c9dde0911170809s7f8ddd2arc9c79539c7124503@mail.gmail.com>
	<c7d45fc70911170914r66f57d99qd0f4f7b56ff8ccf8@mail.gmail.com>
From: Robert Muir <rcmuir@gmail.com>
Date: Tue, 17 Nov 2009 12:24:42 -0500
Message-ID: <8f0ad1f30911170924y707923b0ra4803e63437eb719@mail.gmail.com>
Subject: Re: Lucene Not Throwing Matches Without Spaces
To: general@lucene.apache.org
Content-Type: multipart/alternative; boundary=0016e648f0ae010a5b0478946875

--0016e648f0ae010a5b0478946875
Content-Type: text/plain; charset=UTF-8

Solr's WordDelimiterFilter has an option splitOnCaseChange i think that
might work for your SaddamHussain example.

if you want to use Ted's first approach with lucene, you could try the
compounds package in Lucene's analysis contrib, and give it an english
wordlist.
(or create a very refined custom list of your own as he suggested).

On Tue, Nov 17, 2009 at 12:14 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> That is what is going on.
>
> To fix the problem you generally need to do a bit of statistics on your
> corpus to discover word pairs that appear both with and without a space.
> Once you have that, you have two approaches that will work.
>
> The first approach is to index your text in an ambiguous fashion.  Where
> your "mighty duck" text would have previously been indexed, as Simon says,
> as two terms ["mighty"@0, "duck"@1] with the pair lexicon, you would index
> the text as ["mighty duck"@0, "mighty"@0, "duck"@1].  At this point, either
> query will work.
>
> Another approach that is easier if you don't want to mess with the indexer
> and analyzer chain, is to do the same transformation at query time.  If the
> user types the query [mightyduck], you would rewrite this to be [mightyduck
> OR phrase(mighty duck)].  Similarly, if the user types [mighty duck], you
> would rewrite the query to be [mightyduck OR phrase(mighty duck) OR mighty
> OR duck].
>
> On Tue, Nov 17, 2009 at 8:09 AM, Simon Willnauer <
> simon.willnauer@googlemail.com> wrote:
>
> > Nishu,
> >
> > first you should send this question to java-users not to general :)
> > When you index a doc the the content "mighty duck" your TokenStream
> > most likely builds two tokens t1:"mighty" t2:"duck"
> > the same happens (most likely) when you search for "mighty duck" with
> > the QueryParser so the query will be a boolean TermQuery("mighty") OR
> > TermQuery("duck"). This will retrieve your document. If you search for
> > "mightyduck" the query will only have one boolean clause (actually
> > none, its just a term query) with TermQuery("mightyduck"). Lucene will
> > not find any matches as this term is not in the index.
> >
> > Hope that helps for understanding what is going on.
> >
> > simon
> >
> > On Tue, Nov 17, 2009 at 2:16 PM, Nishu Soni <nishu.soni@3i-infotech.com>
> > wrote:
> > >
> > > Lucene is not throwing matches when search string is without space and
> > data
> > > in my index file is with space.For e.g. if "Saddam Hussain" text is in
> > index
> > > file and I am searchin "SaddamHussain", I am not getting any matches.I
> am
> > > using Boolean Query for scanning.
> > >
> > > Any help will be highly appreciated.
> > > --
> > > View this message in context:
> >
> http://old.nabble.com/Lucene-Not-Throwing-Matches-Without-Spaces-tp26389750p26389750.html
> > > Sent from the Lucene - General mailing list archive at Nabble.com.
> > >
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>


-- 
Robert Muir
rcmuir@gmail.com

--0016e648f0ae010a5b0478946875--