lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Lalevée <>
Subject Re: Flexible index format / Payloads Cont'd
Date Wed, 19 Jul 2006 17:26:49 GMT
Le Mercredi 05 Juillet 2006 13:23, Michael Busch a écrit :
> Doug Cutting wrote:
> > Marvin Humphrey wrote:
> >> IMO, this should wait.  It's going to be freakishly difficult to get
> >> this stuff to work and maintain the commitments that Doug has laid
> >> out for backwards compatibility.
> >
> > Perhaps we can implement an all-new index format, in a new package.
> > An implementation of IndexReader can be provided to integrate with
> > existing search code.  And the ability to add an IndexReader to an
> > index can be provided to upgrade existing indexes to the new format.
> > So the new code would not need to be able to process an old index: the
> > old code can continue to do that.  Does that make sense?  Is that
> > "freakishly difficult"?  We'll need the ability to sniff a directory
> > and tell which version of index it contains, but that should not be
> > too hard.
> >
> > Doug
> +1. I agree that this approach would make it much easier to develop a
> new index format without the commitment of being backward-compatible. I
> would like to help working on a new index format. Who else is going to
> work on it?

I am also interested in improving Lucene too. I took time to respond to this 
thread because I am quite new to Lucene, so I have to learn what you talked 
about, in fact what a payload is. But here it is, I get it ! :)

What I have to do is a web application which will do some faceted search. My 
current workaround is transforming each query in several queries, each by 
categories. So I am interested of your current work.

I had also another issue with the field. Some field can have a type (integer, 
date, string), and/or a language. It is typically some metadata on fields. 
The quick workaround I did is to put the info in the field between some 
square brackets. So I had to do a SkipPrefixTokenizer... dirt but almost 
quick to implement.
Then I looked deeper in the Lucene file format, and I manage to introduce some 
generic field metadata without breaking the file format compatibility. I just 
used another bit of the "Bits" to mark that there is or not some metadata on 
the field. And the metadata is stored next to it :
DocFieldData --> FieldCount, <FieldNum, Bits, FieldMetadata, Value>^FieldCount
FieldMetadata --> ValueSize, <Byte>^ValueSize

Does this feature interest the Lucene commiters ? Should I provide a patch in 
Jira? If not, is there any common place where to provide some patch for some 
Lucene hackers (ie not necessaraily commiters) ?

So, Marvin, could you provide your patch about payload ?
And is there a wiki page where there is a starting point about defining the 
future index format ?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message