lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dion Almaer" <d...@almaer.com>
Subject RE: Dates and others
Date Sun, 23 Nov 2003 23:33:35 GMT
Thanks for responding Ype.

> -----Original Message-----
> From: Ype Kingma [mailto:ykingma@xs4all.nl] 
> Sent: Sunday, November 23, 2003 2:03 PM
> To: Lucene Users List
> Subject: Re: Dates and others
> 
> On Saturday 22 November 2003 18:33, Dion Almaer wrote:
> ...
> >
> > 1. The power of dates:
> >
> >    I am fairly happy with the results of queries on my index.  The 
> > only issue I have is that at the moment the date of the 
> content isn't 
> > considered (since lucene doesn't know about it).  Is there 
> a good way 
> > in which the date of the content could be used to help with the 
> > scoring?  So more recent content shows up higher in the 
> stack.  I have 
> > a date keyword field, but it isn't part of the query 
> itself.  Are there any patterns to help with this?
> 
> You can use the Lucene date field, or use a keyword field eg. 
> in yyyymmdd format. However, Lucene's scoring is not based on 
> the value of a matching term, it's based on term frequencies 
> in documents, on the number of documents in the index 
> containing the term, and on the distance between terms (for 
> proximity queries.) You cannot make the document score depend 
> directly on the value of a (date) field in the document.
> Btw, how big would you want the date influence to be in the score?
> 
> Sorting results by date has been discussed in the past,  see 
> the archives.
> You lose the document scores in this case.


Yeah this is tough.  I don't want to sort by date as then something that was a really low
score but
was recent would show up at the top.
I think I will stick with giving the user the ability to say "between Jan 1 2003 and ..."
instead.

This leads me to another issue actually.  On certain range queries I get exceptions:

Query: modifieddate:[1/1/03 TO 12/31/03]

org.apache.lucene.search.BooleanQuery$TooManyClauses
        at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:109)
        at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:101)
        at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:137)
        at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:188)
        at org.apache.lucene.search.Query.weight(Query.java:120)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:128)
        at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:93)
        at org.apache.lucene.search.Hits.<init>(Hits.java:80)
        at org.apache.lucene.search.Searcher.search(Searcher.java:71)
        at org.apache.lucene.search.Searcher.search(Searcher.java:65)
        at com.portal.util.search.IndexSearch.search(Unknown Source)
        at com.portal.util.search.IndexSearch.main(Unknown Source)

Has anyone run into this problem?


> > 2. +field:foo and the QueryParser:
> >
> >    I ran into some problems where using +field:foo was 
> giving strange 
> > results.  When I changed the queries to "... AND field:foo" 
> everything 
> > was fine.
> >    Am I missing something there?
> 
> Which version of Lucene are you using? There have been some 
> fixes in the query parser of Lucene 1.2, but I don't know 
> precisely which.


I am using 1.3 RC 2.  The AND workaround is fine... just caught me
by surprise.


> > 3. I have some fields suck as title, owner, etc as well as 
> the content 
> > blob which I index and use as the default search field.  Is 
> there an 
> > easy way to extend the QueryParser to merge it with a 
> MultiTermQuery 
> > which can also search this meta data and give them certain 
> weights?  
> > Or, if you go down
> 
> You can provide field weights at document indexing time 
> (norms) and use a MultiTermQuery for searching multiple 
> fields. At query time you can again use field weights.
> I don't know how the scoring of the MultiTermQuery is done, 
> it might use the max. score over the fields of a document, or 
> combine the scores in the fields of a document.


Yeah I will play with this.  For now adding the title to the main body does seem to work pretty
well,
so it may be good enough!

 
> > this path do you have to leave the QueryParser behind and 
> build your 
> > own queries?  Any best practices would be great.
> 
> You have some options:
> - create the MultiTermQuery from the query text, or
> - index the default search field as a single field, eg. by 
> concatenation, and evt. by inserting empty tokens in between 
> to avoid proximity matches.
> This has also been discussed recently, see eg. the discussion 
> on indexing of sentences.
> 
> Searching mutliple fields is normally a little slower than 
> searching a concatenated field. The actual difference depends 
> on you data, so you might experiment a bit. You might eg. 
> index all fields seperately, and also index a default 
> concatenated field.
> 
> Kind regards,
> Ype Kingma


Thanks a lot for the great ideas.

Dion


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message