lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deepa Paranjpe" <dee...@yahoo-inc.com>
Subject Problems with "AND" queries
Date Tue, 13 Feb 2007 20:11:23 GMT

I have small documents indexed. 
When I query the index using a BooleanQuery containing {why,is,the,sky,blue}
with all queries having the MUST BooleanClause, I do not retrieve any
results.
However, when I use only { why,sky,blue} I get results which are 
Why is the sky blue? And several of them.

What is going wrong? Please help. 


-----Original Message-----
From: Stefan Groschupf [mailto:sg@101tec.com] 
Sent: Monday, November 06, 2006 5:18 AM
To: general@lucene.apache.org
Subject: Re: [PROPOSAL] index server project

Hi,

do people think we are already in a stage where we can setup some  
basic infrastructure like mailing list and wiki and move the  
discussion to the new mailing list. Maybe setup a incubator project?

I would be happy to help with such basic tasks.

Stefan



Am 31.10.2006 um 22:03 schrieb Yonik Seeley:

> On 10/30/06, Doug Cutting <cutting@apache.org> wrote:
>> Yonik Seeley wrote:
>> > On 10/18/06, Doug Cutting <cutting@apache.org> wrote:
>> >> We assume that, within an index, a file with a given name is  
>> written
>> >> only once.
>> >
>> > Is this necessary, and will we need the lockless patch (that avoids
>> > renaming or rewriting *any* files), or is Lucene's current index
>> > behavior sufficient?
>>
>> It's not strictly required, but it would make index synchronization a
>> lot simpler. Yes, I was assuming the lockless patch would be  
>> committed
>> to Lucene before this project gets very far.  Something more than  
>> that
>> would be required in order to keep old versions, but this could be as
>> simple as a Directory subclass that refuses to remove files for a  
>> time.
>
> Or a snapshot (hard links) mechanism.
> Lucene would also need a way to open a specific index version (rather
> than just the latest), but I guess that could also be hacked into
> Directory by hiding later "segments" files (assumes lockless is
> committed).
>
>> > It's unfortunate the master needs to be involved on every  
>> document add.
>>
>> That should not normally be the case.
>
> Ahh... I had assumed that "id" in the following method was document  
> id:
>  IndexLocation getUpdateableIndex(String id);
>
> I see now it's index id.
>
> But what is index id exactly?  Looking at the example API you laid
> down, it must be a single physical index (as opposed to a logical
> index).  In which case, is it entirely up to the client to manage
> multi-shard indicies?  For example, if we had a "photo" index broken
> up into 3 shards, each shard would have a separate index id and it
> would be up to the client to know this, and to query across the
> different "photo0", "photo1", "photo2" indicies.  The master would
> have no clue those indicies were related.  Hmmm, that doesn't work
> very well for deletes though.
>
> It seems like there should be the concept of a logical index, that is
> composed of multiple shards, and each shard has multiple copies.
>
> Or were you thinking that a cluster would only contain a single
> logical index, and hence all different index ids are simply different
> shards of that single logical index?  That would seem to be consistent
> with ClientToMasterProtocol .getSearchableIndexes() lacking an id
> argument.
>
>> I was not imagining a real-time system, where the next query after a
>> document is added would always include that document.  Is that a
>> requirement?  That's harder.
>
> Not real-time, but it would be nice if we kept it close to what Lucene
> can currently provide.
> Most people seem fine with a latency of minutes.
>
>> At this point I'm mostly trying to see if this functionality would  
>> meet
>> the needs of Solr, Nutch and others.
>>
>
> It depends on the project scope and how extensible things are.
> It seems like the master would be a WAR, capable of running stand- 
> alone.
> What about index servers (slaves)?  Would this project include just
> the interfaces to be implemented by Solr/Nutch nodes, some common
> implementation code behind the interfaces in the form of a library, or
> also complete standalone WARs?
>
> I'd need to be able to extend the ClientToSlave protocol to add
> additional methods for Solr (for passing in extra parameters and
> returning various extra data such as facets, highlighting, etc).
>
>> Must we include a notion of document identity and/or document  
>> version in
>> the mechanism? Would that facillitate updates and coherency?
>
> It doesn't need to be in the interfaces I don't think, so it depends
> on the scope of the index server implementations.
>
> -Yonik
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
search tech for web 2.1
Menlo Park, California
http://www.101tec.com





Mime
View raw message