lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From javier muguruza <jmugur...@gmail.com>
Subject Re: identifier field as keyword or unindexed
Date Thu, 10 Mar 2005 09:41:33 GMT
Thanks Erik,

I will investigate Filters and I'll see then.


On Wed, 9 Mar 2005 14:43:58 -0500, Erik Hatcher
<erik@ehatchersolutions.com> wrote:
> 
> On Mar 9, 2005, at 10:09 AM, javier muguruza wrote:
> > (I sent this to the old list, I dont know wether it reached the
> > list...just in case I repost it)
> >
> > Hi all,
> >
> > We index our documents in the following way:
> >
> >         doc = new Document();
> >         // mailid
> >         doc.add(Field.UnIndexed("mid",mid));
> >         //body
> >         doc.add(Field.UnStored("body", textb));
> >
> > mid is a unique identifier, and body contains long pieces of text to
> > be indexed.
> >
> > And later make searches on the body field, the mid allows us to find a
> > file on the filesystem with a compressed (and digitally signed)
> > version of the original body indexed.
> > Our way to work in a query in our app is this:
> > 1. first we make a search in a db (for many different reasons) that
> > returns a number (from 0 to thousands) of mid
> > 2. we use lucene to search for some text in many indexes, this returns
> > a second list of mid
> > 3. we return the result as the intersection of both lists.
> >
> > This is working fine right now, but wonder wether we are not using
> > lucene to the fullest, cause we could also store mid as a keyword
> > (instead of unindexed), and add the condition (AND mid==[any mid from
> > our step 1]) to the lucene query we run. My questions are:
> >
> > 1. Is there a limit in the number of conditions I can add to a query??
> > Sometimes we have 10 mids, other times we have thousands of them so we
> > would have to add: AND (mid:mid1 OR mid:mid2 ... OR mid:mid10000).
> > Probably there is a limit, and we could only apply the mid conditions
> > when the number or mids returned by step 1 is smaller than that limit?
> 
> BooleanQuery has a built-in limit of 1,024 clauses so it would only be
> useful when there is a small number of mids.  Consider using a Filter
> though.  There are some built-in ones, but maybe a custom one is best.
> 
> > 2. As the mid is a unique identifier (I guest lucene does not care
> > about that right?)
> 
> Right, Lucene doesn't care about field/term uniqueness.
> 
> > , and the condition on the mid woudl be ANDed to the
> > text query conditions, will it be faster for lucene to look first in
> > the mid field and dont do the text lookup if the mid condition is not
> > fullfilled? I dont know wether I am clear enough...Will I get some
> > benefit on the queries by adding some additional conditions or the
> > cost of adding another field to index will not pay off? Maybe it
> > depends on the number of documents? Maybe it would be best to set mid
> > as a keyword just in case, and add it as conditions later if the
> > searches take too long?
> 
> I doubt you'd even notice the difference.  There is little cost to
> adding the additional field, and looks like you'd benefit from having
> mid as a Keyword.
> 
> Also, with a Filter,  you could use it to bounce to your relational
> database to constrain results based on a set of mids.  Filters are
> designed to be used for multiple queries and cached - keep that in mind
> and maybe it'll work out well in your scenario.
> 
>         Erik
> 
> 
> >
> > thanks for any though on that
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message