lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Fwd: lucene indexing and merge process
Date Thu, 18 Oct 2007 14:38:38 GMT
Forwarding this to java-dev per request.  Seems like the best place  
to discuss this topic.


Begin forwarded message:

> From: "John Wang" <>
> Date: October 17, 2007 5:43:29 PM EDT
> To:
> Subject: lucene indexing and merge process
> Hi Erik:
>     We are revamping our search system here at LinekdIn. And we are  
> using Lucene.
>     One issue we ran across is that we store an UID in Lucene which  
> we map to the DB storage. So given a docid, to lookup its UID, we  
> have the following solutions:
> 1) Index it as a Stored field and get it from reader.document (very  
> slow if recall is large)
> 2) Load/Warmup the FieldCache (for large corpus, loading up the  
> indexreader can be slow)
> 3) construct it using the FieldCache and persist it on disk  
> everytime the index changes. (not suitable for real time indexing,  
> e.g. this process will degrade as # of documents get large)
>     None of the above solutions turn out to be adequate for our  
> requirements.
>      What we end up doing is to modify Lucene code by changing  
> SegmentReader,DocumentWriter,and FieldWriter classes by taking  
> advantage of the Lucene Segment/merge process. E.g:
>      For each segment, we store a .udt file, which is an int[]  
> array, (by changing the FieldWriter class)
>      And SegmentReader will load the .udt file into an array.
>      And merge happens seemlessly.
>      Because the tight encapsulation around these classes, e.g.  
> private and final methods, it is very difficult to extend Lucene  
> while avoiding branch into our own version. Is there a way we can  
> open up and make these classes extensible? We'd be happy to  
> contribute what we have done.
>      I guess to tackle the problem from a different angle: is there  
> a way to incorporate FieldCache into the segments (it is strictly  
> in memory now), and build disk versions while indexing.
>      Hope I am making sense.
>     I did not send this out to the mailing list because I wasn't  
> sure if this is a dev question or an user question, feel free to  
> either forward it to the right mailing list or let me know and I  
> can forward it.
> Thanks
> -John

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message