lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jochen" <>
Subject New Query Type(s)
Date Wed, 07 Jan 2004 16:48:03 GMT
Lucene Gurus:

	After looking at and trying out lucene for quite some time (and
liking it), I would like to create some advanced queries to speed up our
system. The first one I need to be as follows:

	(+"a b c" +"d e")~10

	In other words, I need to run a query in where two phrases (for
right now an exact match will be fine) are in some defined proximity (in
this example, I need "a b c" somewhere close to "d e").

	The indexes created nicely support this kind of functionality, and
the pieces of are all implemented (PhraseQuery, BooleanQuery, PhraseQuery
with Slop). However, I believe that they cannot be stringed together with
the current lucene version, to give me what I need.

	I have studied the code and I will write the code to create this
type of query (and make it available, if I get it working), but I would very
much appreciate a high level roadmap from more experienced people (i.e.
create a new Query Object, change this and that object to do such and such


> -----Original Message-----
> From: Robert Engels []
> Sent: Tuesday, January 06, 2004 1:17 PM
> To: Lucene-Dev
> Subject: normalization BAD DESIGN ?
> The design & implementation of the document/field normalization is very
> poor.
> It requires a byte[] with as (number of documents * number of fields)
> elements!
> With a document store of 100 million documents, with multiple fields, the
> memory required is staggering.
> IndexReader has the following method definition,
> public abstract byte[] norms(String field) throws IOException;
> which is the source of the problem.
> Even returning null from this method does not help, as the PhraseScorer
> and
> derived classes, maintain a reference, and do not perform a null check.
> I have modified 105 of PhraseScorer to be
> if(norms!=null)
>     score *= Similarity.decodeNorm(norms[first.doc]); // normalize
> Would it not be a better design, to define a method in IndexReader
> float getNorm(String fieldname,int docnum);
> so a implementation could cache this information in some fashion, or
> always
> return 1.0 if it didn't care?
> Robert Engels
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message