Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (hermes.apache.org: local policy)
Mime-Version: 1.0 (Apple Message framework v619.2)
In-Reply-To: <f071061b05030907096c12c651@mail.gmail.com>
References: <f071061b0503070949584fcf82@mail.gmail.com>
 <f071061b05030907096c12c651@mail.gmail.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed
Message-Id: <8560cb322eee463457f307300710e4f5@ehatchersolutions.com>
Content-Transfer-Encoding: 7bit
From: Erik Hatcher <erik@ehatchersolutions.com>
Subject: Re: identifier field as keyword or unindexed
Date: Wed, 9 Mar 2005 14:43:58 -0500
To: java-user@lucene.apache.org


On Mar 9, 2005, at 10:09 AM, javier muguruza wrote:
> (I sent this to the old list, I dont know wether it reached the
> list...just in case I repost it)
>
> Hi all,
>
> We index our documents in the following way:
>
>         doc = new Document();
>         // mailid
>         doc.add(Field.UnIndexed("mid",mid));
>         //body
>         doc.add(Field.UnStored("body", textb));
>
> mid is a unique identifier, and body contains long pieces of text to 
> be indexed.
>
> And later make searches on the body field, the mid allows us to find a
> file on the filesystem with a compressed (and digitally signed)
> version of the original body indexed.
> Our way to work in a query in our app is this:
> 1. first we make a search in a db (for many different reasons) that
> returns a number (from 0 to thousands) of mid
> 2. we use lucene to search for some text in many indexes, this returns
> a second list of mid
> 3. we return the result as the intersection of both lists.
>
> This is working fine right now, but wonder wether we are not using
> lucene to the fullest, cause we could also store mid as a keyword
> (instead of unindexed), and add the condition (AND mid==[any mid from
> our step 1]) to the lucene query we run. My questions are:
>
> 1. Is there a limit in the number of conditions I can add to a query??
> Sometimes we have 10 mids, other times we have thousands of them so we
> would have to add: AND (mid:mid1 OR mid:mid2 ... OR mid:mid10000).
> Probably there is a limit, and we could only apply the mid conditions
> when the number or mids returned by step 1 is smaller than that limit?

BooleanQuery has a built-in limit of 1,024 clauses so it would only be 
useful when there is a small number of mids.  Consider using a Filter 
though.  There are some built-in ones, but maybe a custom one is best.

> 2. As the mid is a unique identifier (I guest lucene does not care
> about that right?)

Right, Lucene doesn't care about field/term uniqueness.

> , and the condition on the mid woudl be ANDed to the
> text query conditions, will it be faster for lucene to look first in
> the mid field and dont do the text lookup if the mid condition is not
> fullfilled? I dont know wether I am clear enough...Will I get some
> benefit on the queries by adding some additional conditions or the
> cost of adding another field to index will not pay off? Maybe it
> depends on the number of documents? Maybe it would be best to set mid
> as a keyword just in case, and add it as conditions later if the
> searches take too long?

I doubt you'd even notice the difference.  There is little cost to 
adding the additional field, and looks like you'd benefit from having 
mid as a Keyword.

Also, with a Filter,  you could use it to bounce to your relational 
database to constrain results based on a set of mids.  Filters are 
designed to be used for multiple queries and cached - keep that in mind 
and maybe it'll work out well in your scenario.

	Erik


>
> thanks for any though on that
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org