lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sonu Sudhakar" <son...@gmail.com>
Subject Re: Boolean Query Issue
Date Tue, 03 Jun 2008 05:30:41 GMT
Sorry, The first mail I forgot to add the line for setting default
QueryParser operator. I have Included that too.


> This is the code I am using for search.
>
> public void doSearch(String userQuery, int stemFlag, int sortFlag, String[]
> sortFileds) throws Exception {
>
>             PerFieldAnalyzerWrapper analyzer;
>             if (stemFlag == 1) {
>                 analyzer = new PerFieldAnalyzerWrapper(new
> StemmingAnalyzer());
>             } else {
>                 analyzer = new PerFieldAnalyzerWrapper(new
> StandardAnalyzer());
>             }
>
>             //Stemming is not applied on country and state fields
>             analyzer.addAnalyzer("IS", new StandardAnalyzer());
>             analyzer.addAnalyzer("AS", new StandardAnalyzer());
>             analyzer.addAnalyzer("IC", new StandardAnalyzer());
>             analyzer.addAnalyzer("AC", new StandardAnalyzer());
>
>             this.userQuery = userQuery;
>
>             // Looking for remote indexes
>             RemoteLookup rLookup = new RemoteLookup();
>
>             // this will create Searchables object array
>             Searchable[] srbl = rLookup.getSearchables();
>
>             pmsearcher = new ParallelMultiSearcher(srbl);
>
>             QueryParser qp = null;
>             qp = new QueryParser("SPEC", analyzer);
>             qp.setDefaultOperator(AND_OPERATOR);



>
>             query = qp.parse(userQuery);
>             //System.out.println(query.toString());
>
>             //creates the sort fields
>             sort = new SDFSort().getSort(sortFlag, sortFileds);
>
>             // Search over multiple indexes
>             hits = pmsearcher.search(query, sort);
>
>     }
>
> This is not exactly as  my code. But added almost all searching parts.
>
> Thanks,
> Sonu
>
>
> On Mon, Jun 2, 2008 at 6:19 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> You need to include your code I think. This makes no sense on a quick
>> look, so unless we see some code it'll be hard to know whether
>> we're looking at anything relevant.
>>
>> Best
>> Erick
>>
>> On Mon, Jun 2, 2008 at 1:19 AM, Sonu Sudhakar <sonusr@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I have done some more analysis on this issue. I think it is related to
>> > lucene's default operator.
>> > I am getting excat results, when I sets the default operator as 'OR',
>> but
>> > facing problem when setting the default operator as 'AND'.
>> >
>> > The following are the lucene QueryParser outputs for both cases.
>> >
>> > Query :* TTL:store AND TTL:data OR TTL:variable
>> >
>> > *1. When lucene default operator is '*OR'
>> >
>> > *QueryParser output  using toString method: *
>> > +TTL:store +TTL:data TTL:variable
>> >
>> >
>> > *2. When lucene default operator is '*AND'
>> >
>> > *QueryParser output  using toString method:
>> > *+TTL:store TTL:data TTL:variable
>> >
>> > *The output of second case is confusing me.
>> >
>> > Could anybody please give me an explanation for this behavior?
>> >
>> > Thanks,
>> > Sonu
>> >
>> > On Thu, May 29, 2008 at 3:49 PM, Sonu Sudhakar <sonusr@gmail.com>
>> wrote:
>> >
>> > > Erick,
>> > >
>> > > Thanks for your reply.
>> > >
>> > > I am working with approximately 1 million documents. They are indexed
>> in
>> > 4
>> > > servers. Each document has multiple fields. I am using
>> > ParallelMultiSearcher
>> > > for searching purpose.
>> > >
>> > > I tried a few queries in the title(TTL) field.
>> > >
>> > > i started with a simple query without boolean operators.
>> > >
>> > > *1. TTL:data => 3733 results (all matches had "data" in title)*
>> > >
>> > > Then I tried a second one with AND operator.
>> > >
>> > > *2. TTL:data AND TTL:store => 19 results*
>> > >
>> > > I analyzed the results. the results had both "data" and "store" in the
>> > > title.
>> > >
>> > > *I then tried OR operator*
>> > >
>> > > *3. TTL:data AND TTL:store OR TTL:variable*
>> > >
>> > > I got 3,733 results., same as the query TTL:data.
>> > >
>> > > I even tried giving a meaningless query
>> > >
>> > > TTL:data AND TTL:storettttt OR TTL:variablettttt => 3,733 results (The
>> > > results were same as that of TTL:data.)
>> > >
>> > > TTL:data AND TTL:computer OR TTL:device => 3,733 results (this also
>> > showed
>> > > the same results as above)
>> > >
>> > > The same thing repeats for other cases too. The queries below also
>> > behaved
>> > > the same way.
>> > > i.e. -
>> > >
>> > > 1. TTL:store AND TTL:data OR TTL:variable => 76 results
>> > > 2. TTL:store AND TTL:datatttt OR TTL:variabletttt => 76 results
>> > > 3. TTL:store AND TTL:computer OR TTL:device => 76 results
>> > >
>> > >
>> > > 1. TTL:variable AND TTL:data OR  TTL:store => 1,496 results
>> > > 2. TTL:variable AND TTL:datatttt OR  TTL:storetttt => 1,496 results
>> > > 3. TTL:variable AND TTL:computer OR  TTL:device => 1,496 results
>> > >
>> > > I hope you have a clearer picture of my issue now.
>> > >
>> > > Thanks,
>> > > Sonu
>> > >
>> > >
>> > > On Wed, May 28, 2008 at 7:09 PM, Erick Erickson <
>> erickerickson@gmail.com
>> > >
>> > > wrote:
>> > >
>> > >> It's unclear what you *should* expect. How is your data
>> > >> distributed?
>> > >>
>> > >> In other words, how many documents do you have? In this example,
>> > >> for instance,
>> > >> 1. TTL:data AND TTL:store OR TTL:variable => 3,733 results
>> > >> it considered the TTL:data part only.
>> > >>
>> > >> it's perfecily reasonable if every document that had "variable" in
>> the
>> > >> field *also* has "data" and "store" in the field. So your numbers
>> > >> don't give us much to work with.....
>> > >>
>> > >> Remember, though, that Lucene syntax isn't a pure boolean syntax. See
>> > >>
>> > >> http://wiki.apache.org/lucene-java/BooleanQuerySyntax
>> > >>
>> > >> And when in doubt parenthesize <G>...
>> > >>
>> > >> Best
>> > >> Erick
>> > >>
>> > >> On Wed, May 28, 2008 at 7:44 AM, Sonu Sudhakar <sonusr@gmail.com>
>> > wrote:
>> > >>
>> > >> > Hi,
>> > >> >
>> > >> > I have some issue with boolean queries.
>> > >> >
>> > >> > I am using Lucene-core-2.3.1.
>> > >> >
>> > >> > I have done test on boolean query with 3 terms (data, store,
>> variable)
>> > >> in
>> > >> > my
>> > >> > TTL field. The TTL field is indexed and searched using
>> > StandardAnalyzer.
>> > >> >
>> > >> > The three terms when searched individually gave the following
>> result
>> > >> >
>> > >> > 1. TTL:data  => 3,733 results
>> > >> > 2. TTL:store  => 76 results
>> > >> > 3. TTL:variable  => 1,496 results
>> > >> >
>> > >> > But found issue when combining these terms with boolean operators.
>> > >> >
>> > >> > e.g.
>> > >> > 1. TTL:data AND TTL:store OR TTL:variable => 3,733 results
>> > >> > it considered the TTL:data part only.
>> > >> >
>> > >> > 2. TTL:store AND TTL:data OR TTL:variable => 76 results
>> > >> > it considered  the TTL:store part only.
>> > >> >
>> > >> > 3. TTL:variable AND TTL:data OR  TTL:store => 1,496 results
>> > >> > it considered  the TTL:variable part only.
>> > >> >
>> > >> > But I am getting correct result when combining terms with 'AND'
>> > >> operator. I
>> > >> > think the issue is with 'OR' operator.
>> > >> >
>> > >> >
>> > >> > Could anybody give an explanation for this behavior of lucene?
>> > >> > Could you give suggestions to rectify this?
>> > >> >
>> > >> > Thanks,
>> > >> > Sonu
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message