lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sonu Sudhakar" <son...@gmail.com>
Subject Re: Boolean Query Issue
Date Mon, 02 Jun 2008 05:19:44 GMT
Hi,

I have done some more analysis on this issue. I think it is related to
lucene's default operator.
I am getting excat results, when I sets the default operator as 'OR', but
facing problem when setting the default operator as 'AND'.

The following are the lucene QueryParser outputs for both cases.

Query :* TTL:store AND TTL:data OR TTL:variable

*1. When lucene default operator is '*OR'

*QueryParser output  using toString method: *
+TTL:store +TTL:data TTL:variable


*2. When lucene default operator is '*AND'

*QueryParser output  using toString method:
*+TTL:store TTL:data TTL:variable

*The output of second case is confusing me.

Could anybody please give me an explanation for this behavior?

Thanks,
Sonu

On Thu, May 29, 2008 at 3:49 PM, Sonu Sudhakar <sonusr@gmail.com> wrote:

> Erick,
>
> Thanks for your reply.
>
> I am working with approximately 1 million documents. They are indexed in 4
> servers. Each document has multiple fields. I am using ParallelMultiSearcher
> for searching purpose.
>
> I tried a few queries in the title(TTL) field.
>
> i started with a simple query without boolean operators.
>
> *1. TTL:data => 3733 results (all matches had "data" in title)*
>
> Then I tried a second one with AND operator.
>
> *2. TTL:data AND TTL:store => 19 results*
>
> I analyzed the results. the results had both "data" and "store" in the
> title.
>
> *I then tried OR operator*
>
> *3. TTL:data AND TTL:store OR TTL:variable*
>
> I got 3,733 results., same as the query TTL:data.
>
> I even tried giving a meaningless query
>
> TTL:data AND TTL:storettttt OR TTL:variablettttt => 3,733 results (The
> results were same as that of TTL:data.)
>
> TTL:data AND TTL:computer OR TTL:device => 3,733 results (this also showed
> the same results as above)
>
> The same thing repeats for other cases too. The queries below also behaved
> the same way.
> i.e. -
>
> 1. TTL:store AND TTL:data OR TTL:variable => 76 results
> 2. TTL:store AND TTL:datatttt OR TTL:variabletttt => 76 results
> 3. TTL:store AND TTL:computer OR TTL:device => 76 results
>
>
> 1. TTL:variable AND TTL:data OR  TTL:store => 1,496 results
> 2. TTL:variable AND TTL:datatttt OR  TTL:storetttt => 1,496 results
> 3. TTL:variable AND TTL:computer OR  TTL:device => 1,496 results
>
> I hope you have a clearer picture of my issue now.
>
> Thanks,
> Sonu
>
>
> On Wed, May 28, 2008 at 7:09 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> It's unclear what you *should* expect. How is your data
>> distributed?
>>
>> In other words, how many documents do you have? In this example,
>> for instance,
>> 1. TTL:data AND TTL:store OR TTL:variable => 3,733 results
>> it considered the TTL:data part only.
>>
>> it's perfecily reasonable if every document that had "variable" in the
>> field *also* has "data" and "store" in the field. So your numbers
>> don't give us much to work with.....
>>
>> Remember, though, that Lucene syntax isn't a pure boolean syntax. See
>>
>> http://wiki.apache.org/lucene-java/BooleanQuerySyntax
>>
>> And when in doubt parenthesize <G>...
>>
>> Best
>> Erick
>>
>> On Wed, May 28, 2008 at 7:44 AM, Sonu Sudhakar <sonusr@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I have some issue with boolean queries.
>> >
>> > I am using Lucene-core-2.3.1.
>> >
>> > I have done test on boolean query with 3 terms (data, store, variable)
>> in
>> > my
>> > TTL field. The TTL field is indexed and searched using StandardAnalyzer.
>> >
>> > The three terms when searched individually gave the following result
>> >
>> > 1. TTL:data  => 3,733 results
>> > 2. TTL:store  => 76 results
>> > 3. TTL:variable  => 1,496 results
>> >
>> > But found issue when combining these terms with boolean operators.
>> >
>> > e.g.
>> > 1. TTL:data AND TTL:store OR TTL:variable => 3,733 results
>> > it considered the TTL:data part only.
>> >
>> > 2. TTL:store AND TTL:data OR TTL:variable => 76 results
>> > it considered  the TTL:store part only.
>> >
>> > 3. TTL:variable AND TTL:data OR  TTL:store => 1,496 results
>> > it considered  the TTL:variable part only.
>> >
>> > But I am getting correct result when combining terms with 'AND'
>> operator. I
>> > think the issue is with 'OR' operator.
>> >
>> >
>> > Could anybody give an explanation for this behavior of lucene?
>> > Could you give suggestions to rectify this?
>> >
>> > Thanks,
>> > Sonu
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message