lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen>
Subject Re: External sort
Date Fri, 18 Dec 2009 13:40:28 GMT
On Fri, 2009-12-18 at 12:47 +0100, Ganesh wrote:
> I am using Integer for datetime. As the data grows, I am hitting the 
> upper limit.

Could you give us some numbers? Document count, index size in GB, amount
of RAM available for Lucene?

> One option most of them in the group discussed about using EHcache. 
> Let consider the below data get indexed. unique_id is the id 
> generated for every record. unique_id,  field1,  field2,  date_time
> In Ehcache, Consider I am storing
> unique_id, date_time
> How could i merge the results from Lucene and Ehcache? Do I need to 
> fetch all the search results and compare it against the EHcache
> results and decide (using FieldComparatorSource).

As you are storing the date_time in the index, you don't win anything by
caching the values externally: Reading the unique_id needed for lookup
in the Ehcache takes just as long as reading the date_time directly.

> (future thought / research) One more thought, Is there any way to 
> write the index in sorted order, May be while merging. Assign docid
> by sorting the selected field.

You cannot control the docID that way, but Lucene keeps documents in
index order, so you could do this by sorting your data before index

You're touching on a recurring theme here, as coordination with external
data-handling could be done very efficiently if Lucene provided a
persistent id as a first-class attribute for documents. The problem is
that it would require a lot of changes to the API and that it would mean
an additional non-optional overhead for all Lucene users. I haven't kept
track on enough threads on the developer-list, so a better solution to
the problem might have been found.

Toke Eskildsen

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message