lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Flexible index format / Payloads Cont'd
Date Thu, 03 Aug 2006 19:49:39 GMT

On Jul 31, 2006, at 8:25 AM, Nicolas Lalevée wrote:
> That looks good, but there is one restriction : it have to be per  
> document.

Yes, what I laid out was per-document - for each document, the fdx  
file would keep a file pointer and an integer mapping to a codec.

> In fact I was thinking about a more generic version that will allow  
> the format
> compatibility, keeping .fdx as is :
> FieldData (.fdt) -->  <DocFieldData>SegSize
> DocFieldData --> FieldCount, <FieldNum, RawData>FieldCount
> And a default FieldsDataWriter will be the actual one, it will read  
> the
> RawData as Bits, Value, with Value -->  String | BinaryValue,....
> Then, for my app, I will provide some custom FieldsDataWriter that  
> will do
> exactly what I want.

OK, that's quite similar, but with the info specifying how to  
deserialize the document stored in fdt rather than fdx.  However, I  
don't think what you're describing makes the field storage in Lucene  
arbitrarily extensible, since you're just going to override  
FieldsWriter/FieldsReader rather than modify them so that they can  
use arbitrary codecs.

I think what I want to do is turn Lucene into an Object-Oriented  
Database, or at least have Lucene adopt some characteristics of an  
ODBMS.  However, I haven't used a real ODBMS and I'm not up on the  
theory, so I can't say for sure.  I've been doing a little reading  
here and there on object databases, but I've been extraordinarily  
busy the last few weeks and haven't been able to study it in depth.

The main point is this:

Lucene users have diverse needs for what gets stored in the document/ 
field storage.  We've been meeting those needs by assigning more and  
more bit flags.  That can't continue that ad infinitum.  However, we  
*can* meet everyone's needs by applying a variant of the "Replace  
Conditionals With Polymorphism" refactoring technique... (Link to

Think of those bit flags as an if-else chain.  Instead of all those  
conditionals describing all the attributes of the Lucene Document you  
want to store at that file pointer, we allow you to put whatever kind  
of serialized object you desire there.  Maybe it's a Lucene  
Document.  Maybe it's a FrechDocument.  Maybe it's a  
RussianDocument.  Maybe it's a wrapped-up jpg.  You choose.

Instead of continually adding to the complexity of the  
deserialization algorithm, we we make that deserialization algorithm  

Marvin Humphrey
Rectangular Research

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message