lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: New Query Type(s)
Date Wed, 07 Jan 2004 18:11:43 GMT
Jochen wrote:
> Please disregard my prior post. I see that I outed myself as stupid.

This was not a stupid request, but a reasonable one.  In fact I 
currently have a contract to implement such a feature.  It should show 
up in the next month or so.

Doug

>>-----Original Message-----
>>From: Jochen [mailto:lucenelist@quontis.com]
>>Sent: Wednesday, January 07, 2004 8:48 AM
>>To: 'Lucene Developers List'
>>Subject: New Query Type(s)
>>
>>Lucene Gurus:
>>
>>	After looking at and trying out lucene for quite some time (and
>>liking it), I would like to create some advanced queries to speed up our
>>system. The first one I need to be as follows:
>>
>>	(+"a b c" +"d e")~10
>>
>>	In other words, I need to run a query in where two phrases (for
>>right now an exact match will be fine) are in some defined proximity (in
>>this example, I need "a b c" somewhere close to "d e").
>>
>>	The indexes created nicely support this kind of functionality, and
>>the pieces of are all implemented (PhraseQuery, BooleanQuery, PhraseQuery
>>with Slop). However, I believe that they cannot be stringed together with
>>the current lucene version, to give me what I need.
>>
>>	I have studied the code and I will write the code to create this
>>type of query (and make it available, if I get it working), but I would
>>very
>>much appreciate a high level roadmap from more experienced people (i.e.
>>create a new Query Object, change this and that object to do such and such
>>...).
>>
>>	Cheers!
>>		Jochen
>>
>>
>>>-----Original Message-----
>>>From: Robert Engels [mailto:rengels@ix.netcom.com]
>>>Sent: Tuesday, January 06, 2004 1:17 PM
>>>To: Lucene-Dev
>>>Subject: normalization BAD DESIGN ?
>>>
>>>The design & implementation of the document/field normalization is very
>>>poor.
>>>
>>>It requires a byte[] with as (number of documents * number of fields)
>>>elements!
>>>
>>>With a document store of 100 million documents, with multiple fields,
>>
>>the
>>
>>>memory required is staggering.
>>>
>>>IndexReader has the following method definition,
>>>
>>>public abstract byte[] norms(String field) throws IOException;
>>>
>>>which is the source of the problem.
>>>
>>>Even returning null from this method does not help, as the PhraseScorer
>>>and
>>>derived classes, maintain a reference, and do not perform a null check.
>>>
>>>I have modified 105 of PhraseScorer to be
>>>
>>>if(norms!=null)
>>>    score *= Similarity.decodeNorm(norms[first.doc]); // normalize
>>>
>>>Would it not be a better design, to define a method in IndexReader
>>>
>>>float getNorm(String fieldname,int docnum);
>>>
>>>so a implementation could cache this information in some fashion, or
>>>always
>>>return 1.0 if it didn't care?
>>>
>>>Robert Engels
>>>
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message