lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dizh <>
Subject how do I paginate Lucene search results deeply
Date Thu, 14 Mar 2013 03:11:00 GMT
Hi, Again, I will ask a Long-Time question, It has been asked for many time, but I googled
a lot , but no good solution:

I asked one time before, but I think the answer is not very well, SO:

The scenario:

I indexed 5000000 documents, Of course it can be more 10000000/20000000/.......

each document has a timestamp identify the time which it is indexed, I want search the documents
using sort, the sort field
is the timestamp, of course search(query, filter,20), this is very fast.

but when you do paging, for example in a web app , the user want to go to the last 49999980-5000000,
well, it is slowly...

I roughly saw the lucene src, when do sort, it use FieldCache, but I wondered how I can do
to improve performace? 

Please do not say that search 49999990 is not reasonable, when you use goolge or other search
engine, of course you will not go to 49999990

but the senario is:
         I have a large number of Log4J logs, and I want to index them and present them using
web ui. 

the entry format like this:

timestamp field1 field2 field3

I only want to osrt by timestamp ,can anyone give a hint?

I thought a way , Well , MySQL support partition, We often partition by date, so I want to
partition by time, Eg. 1 hour one index to reduce search 

count, but the question is : if in a hour It has 5000000 log (generated by log4J trace log)
, it is still very slow.
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)

is intended only for the use of the intended recipient and may be confidential and/or privileged
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or
is strictly prohibited, and may be unlawful.If you have received this communication in error,please

immediately notify the sender by return e-mail, and delete the original message and all copies
your system. Thank you. 
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message