lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mordo, Aviran (EXP N-NANNATEK)" <>
Subject RE: Changing the scoring (newest doc date first)
Date Wed, 17 May 2006 17:35:01 GMT
When you write your query, you can add a date range with a boot factor
for this field, i.e boost y a factor x the documents that have a date of
today, boost  by x-1 the documents from the past wee, boost by x-2 the
documents from the past two weeks, etc'.

This will not be a perfect sort on the dates but it will boost newer
documents depends on your date range.



-----Original Message-----
From: Marcus Falck [] 
Sent: Tuesday, May 16, 2006 2:43 PM
Subject: Changing the scoring (newest doc date first)

I'm working on a very large implementation of a search engine based on
the lucene api (1.4.3). We have also been investigating enterprise
search companies such as FAST and Verity but have come to the conclusion
that we might aswell save ourselves 1 millon dollars by doing our own
implementation on lucene.
What we are talking about here is to index up data from alot of
different system all containing ALOT of document. This index will be
distributed by range ( date ) and scaled with 1 or more machines
containing the same index per range (load balanced using round robin). 
Currently the total size of all documents we need to index is around 2TB
(200 million documents) but this is growing with approximentely 200 000
document on a daily basis.
I have already written code for a prototype that contains fetcher
application, for fetching data from the orignal systems storages and
distributes the documents using SOAP over TCP to the correct data
intervall (and the intervalls machines), SearchMachineHost (the actual
index/search per machine), Search/Index api (that adds transparancy to
the whole clustering part), AlertHost (for time sensetive alerts) and
demo applications. Every thing looks very good we are very satisfied
with the performance. 
There is however one LARGE problem that we have run into. All search
result should be displayed sorted with the newest document at top. We
tried to accomplish this using Lucene's sort capabilites but quickly ran
into large performance bottlenecks. So i figured since the default sort
is by relevance i would like to change the relevance so that we don't
even need to sort the documents. I guess alot of people at this mail
list can give me valuable hints about how to accomplish this! 
(Since i now about the ability to sort by index id (which i haven't
tried) I can also add that i can't guarantee that all documents will be
added in correct date order (remember the several systems,  the future
plans is to buy content from different actors on the market and index it
Please help me in my fight against FAST and Verity =D
/ Regards
Marcus Falck, Stockholm, Sweden. 
I would also like to thank all people that have been involved in the
lucene development. 
Very nice work!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message