lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Che Dong <ched...@chedong.com>
Subject Re: sorting on "dates" a little fuzzy...
Date Fri, 22 Apr 2005 06:18:51 GMT
James 写道:
> Hi Che-
> 
> The presort method was our first approach but this doesn't work
> in practice because we update the index incrementally and insertion order
> doesn't match date ordering as we add updates.
> 
> I don't think sorting top hits only will deliver what the user is
> expecting -- that is, results listed in most-recent-first order.  
> Is there a better way to do it?
I don't think a single index can match both your requirment on speed 
and compacity.
How about split databased into multi level: daily / weekly / monthly if 
searcher can get enough top hits in top db.

> 
> BTW, we're not seeing a ridiculous performance degredation arising
> out of sorts on large result sets.  But on the other hand, sort
> doesn't seem to be working very well, so far...
I check the HitCollector: seem Lucene will pre-fetch first 100 results 
by default. Lucene with pre-fetch to 200 results only if you walk to 101.

BTW: I feel lucene is getting too complex after adding to many functions 
on customized sorting and fuzzy query.

> 
> Regards,
> James 
> 
Che Dong
http://www.chedong.com/


> --- Che Dong <chedong@chedong.com> wrote:
> 
>>Just like Google said: full text search service is not traditional 
>>database application. Lucene is not a database too: if you wanna sort on 
>>some fields, you'd better pre-sort it before it indexed: like date. then 
>>get results by doc id.
>>
>>For lucene you can only sort results in top hits. if you sort 400k 
>>result hits by date: you lost the speed of Lucene.
>>
>>
>>Thanks
>>
>>Che Dong
>>http://www.chedong.com/
>>
>>Erik Hatcher 写道:
>>
>>>On Apr 21, 2005, at 5:22 PM, James Levine wrote:
>>>
>>>
>>>>I have an index of around 3 million records, and typical queries
>>>>can result in result sets of between 1 and 400,000 results.
>>>>
>>>>We have indexed "dateTime" fields in the form 20050415142, that is, to
>>>>10-minute precision.
>>>>
>>>>When I try to sort queries I get something back that is roughly sorted
>>>>on index, but not quite. Stuff is out of order just a bit. The
>>>>size of the result set does not seem to be related occurance of
>>>>this problem.
>>>>
>>>>We've tried lucene 1.4-final and1.4.3.
>>>>
>>>>my code looks like this
>>>>
>>>>s = new Sort( new SortField[] { new SortField( "dateTime", 
>>>>SortField.STRING,
>>>>true ), SortField.FIELD_SCORE } );
>>>>
>>>>...
>>>>
>>>>hits = searcher.search( qry, s );
>>>>
>>>>
>>>>Any help is appreciated, I'm so far baffled by this problem.
>>>
>>>
>>>I don't have a solution, but rather some questions to check.... are all 
>>>dateTime's the same width, zero padded on the right?  Does every 
>>>document have a dateTime field?
>>>
>>>I recommend you sort with type INT instead of STRING if it fits, or 
>>>maybe LONG.  STRING will use the most resources for sorting.
>>>
>>>    Erik
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message