lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sonu Sudhakar" <son...@gmail.com>
Subject Re: Boolean Query Issue
Date Tue, 03 Jun 2008 05:04:03 GMT
Hi,

This is the code I am using for search.

public void doSearch(String userQuery, int stemFlag, int sortFlag, String[]
sortFileds) throws Exception {

            PerFieldAnalyzerWrapper analyzer;
            if (stemFlag == 1) {
                analyzer = new PerFieldAnalyzerWrapper(new
StemmingAnalyzer());
            } else {
                analyzer = new PerFieldAnalyzerWrapper(new
StandardAnalyzer());
            }

            //Stemming is not applied on country and state fields
            analyzer.addAnalyzer("IS", new StandardAnalyzer());
            analyzer.addAnalyzer("AS", new StandardAnalyzer());
            analyzer.addAnalyzer("IC", new StandardAnalyzer());
            analyzer.addAnalyzer("AC", new StandardAnalyzer());

            this.userQuery = userQuery;

            // Looking for remote indexes
            RemoteLookup rLookup = new RemoteLookup();

            // this will create Searchables object array
            Searchable[] srbl = rLookup.getSearchables();

            pmsearcher = new ParallelMultiSearcher(srbl);

            QueryParser qp = null;
            qp = new QueryParser("SPEC", analyzer);

            query = qp.parse(userQuery);
            //System.out.println(query.toString());

            //creates the sort fields
            sort = new SDFSort().getSort(sortFlag, sortFileds);

            // Search over multiple indexes
            hits = pmsearcher.search(query, sort);

    }

This is not exactly as  my code. But added almost all searching parts.

Thanks,
Sonu

On Mon, Jun 2, 2008 at 6:19 PM, Erick Erickson <erickerickson@gmail.com>
wrote:

> You need to include your code I think. This makes no sense on a quick
> look, so unless we see some code it'll be hard to know whether
> we're looking at anything relevant.
>
> Best
> Erick
>
> On Mon, Jun 2, 2008 at 1:19 AM, Sonu Sudhakar <sonusr@gmail.com> wrote:
>
> > Hi,
> >
> > I have done some more analysis on this issue. I think it is related to
> > lucene's default operator.
> > I am getting excat results, when I sets the default operator as 'OR', but
> > facing problem when setting the default operator as 'AND'.
> >
> > The following are the lucene QueryParser outputs for both cases.
> >
> > Query :* TTL:store AND TTL:data OR TTL:variable
> >
> > *1. When lucene default operator is '*OR'
> >
> > *QueryParser output  using toString method: *
> > +TTL:store +TTL:data TTL:variable
> >
> >
> > *2. When lucene default operator is '*AND'
> >
> > *QueryParser output  using toString method:
> > *+TTL:store TTL:data TTL:variable
> >
> > *The output of second case is confusing me.
> >
> > Could anybody please give me an explanation for this behavior?
> >
> > Thanks,
> > Sonu
> >
> > On Thu, May 29, 2008 at 3:49 PM, Sonu Sudhakar <sonusr@gmail.com> wrote:
> >
> > > Erick,
> > >
> > > Thanks for your reply.
> > >
> > > I am working with approximately 1 million documents. They are indexed
> in
> > 4
> > > servers. Each document has multiple fields. I am using
> > ParallelMultiSearcher
> > > for searching purpose.
> > >
> > > I tried a few queries in the title(TTL) field.
> > >
> > > i started with a simple query without boolean operators.
> > >
> > > *1. TTL:data => 3733 results (all matches had "data" in title)*
> > >
> > > Then I tried a second one with AND operator.
> > >
> > > *2. TTL:data AND TTL:store => 19 results*
> > >
> > > I analyzed the results. the results had both "data" and "store" in the
> > > title.
> > >
> > > *I then tried OR operator*
> > >
> > > *3. TTL:data AND TTL:store OR TTL:variable*
> > >
> > > I got 3,733 results., same as the query TTL:data.
> > >
> > > I even tried giving a meaningless query
> > >
> > > TTL:data AND TTL:storettttt OR TTL:variablettttt => 3,733 results (The
> > > results were same as that of TTL:data.)
> > >
> > > TTL:data AND TTL:computer OR TTL:device => 3,733 results (this also
> > showed
> > > the same results as above)
> > >
> > > The same thing repeats for other cases too. The queries below also
> > behaved
> > > the same way.
> > > i.e. -
> > >
> > > 1. TTL:store AND TTL:data OR TTL:variable => 76 results
> > > 2. TTL:store AND TTL:datatttt OR TTL:variabletttt => 76 results
> > > 3. TTL:store AND TTL:computer OR TTL:device => 76 results
> > >
> > >
> > > 1. TTL:variable AND TTL:data OR  TTL:store => 1,496 results
> > > 2. TTL:variable AND TTL:datatttt OR  TTL:storetttt => 1,496 results
> > > 3. TTL:variable AND TTL:computer OR  TTL:device => 1,496 results
> > >
> > > I hope you have a clearer picture of my issue now.
> > >
> > > Thanks,
> > > Sonu
> > >
> > >
> > > On Wed, May 28, 2008 at 7:09 PM, Erick Erickson <
> erickerickson@gmail.com
> > >
> > > wrote:
> > >
> > >> It's unclear what you *should* expect. How is your data
> > >> distributed?
> > >>
> > >> In other words, how many documents do you have? In this example,
> > >> for instance,
> > >> 1. TTL:data AND TTL:store OR TTL:variable => 3,733 results
> > >> it considered the TTL:data part only.
> > >>
> > >> it's perfecily reasonable if every document that had "variable" in the
> > >> field *also* has "data" and "store" in the field. So your numbers
> > >> don't give us much to work with.....
> > >>
> > >> Remember, though, that Lucene syntax isn't a pure boolean syntax. See
> > >>
> > >> http://wiki.apache.org/lucene-java/BooleanQuerySyntax
> > >>
> > >> And when in doubt parenthesize <G>...
> > >>
> > >> Best
> > >> Erick
> > >>
> > >> On Wed, May 28, 2008 at 7:44 AM, Sonu Sudhakar <sonusr@gmail.com>
> > wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I have some issue with boolean queries.
> > >> >
> > >> > I am using Lucene-core-2.3.1.
> > >> >
> > >> > I have done test on boolean query with 3 terms (data, store,
> variable)
> > >> in
> > >> > my
> > >> > TTL field. The TTL field is indexed and searched using
> > StandardAnalyzer.
> > >> >
> > >> > The three terms when searched individually gave the following result
> > >> >
> > >> > 1. TTL:data  => 3,733 results
> > >> > 2. TTL:store  => 76 results
> > >> > 3. TTL:variable  => 1,496 results
> > >> >
> > >> > But found issue when combining these terms with boolean operators.
> > >> >
> > >> > e.g.
> > >> > 1. TTL:data AND TTL:store OR TTL:variable => 3,733 results
> > >> > it considered the TTL:data part only.
> > >> >
> > >> > 2. TTL:store AND TTL:data OR TTL:variable => 76 results
> > >> > it considered  the TTL:store part only.
> > >> >
> > >> > 3. TTL:variable AND TTL:data OR  TTL:store => 1,496 results
> > >> > it considered  the TTL:variable part only.
> > >> >
> > >> > But I am getting correct result when combining terms with 'AND'
> > >> operator. I
> > >> > think the issue is with 'OR' operator.
> > >> >
> > >> >
> > >> > Could anybody give an explanation for this behavior of lucene?
> > >> > Could you give suggestions to rectify this?
> > >> >
> > >> > Thanks,
> > >> > Sonu
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message