lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Created: (LUCENE-2946) change file format documentation from "bit-for-bit" to highlevel
Date Wed, 02 Mar 2011 19:40:36 GMT
change file format documentation from "bit-for-bit" to highlevel

                 Key: LUCENE-2946
             Project: Lucene - Java
          Issue Type: Task
          Components: Website
            Reporter: Robert Muir
             Fix For: 4.0

While reviewing website docs in LUCENE-2924,
I noticed the the existing fileformats is going to be pretty hopeless for 4.0.

Before it described the format "bit-for-bit", but with flexible indexing this is 
somewhat silly (and who really wants a bit-for-bit explanation of some of the new formats!)

I think it would be much better to give a high-level overview, perhaps linking to javadocs
even source code for the low-level details. 

We probably should delay this until 4.0 is really close in sight (since things are changing
so fast) but we can go ahead and think about it some now.

For example:
* high level explanation of what a codec is, and the various subsystems one is usually composed
of (terms index, terms data, skiplist impl, postings impl, etc). We can reiterate that you
can make your own, and hopefully this kind of documentation will actually encourage that.
* high level explanation of what StandardCodec is "composed of". For example assume its Variable
Terms Index, Block Terms Reader, PForDelta docs and freqs and Simple64 positions. I think
really this is the only codec we should try to "diagram" in any way.
* high level explanation (probably with links) of some of the components. For example we could
explain what the purpose of a Terms Index is, and that this implementation uses a finite state
transducer to find the terms block for a given term. In this case maybe we have an image now
that Dawid made the toDot useful.
* high level explanation (probably with links) of some of the compression algorithms. For
example, we could explain the basics of the available algorithms we have (vbyte/simple/for/pfor/...)
and what their advantages and disadvantages are.

Some of the things i mentioned here are probably optional, for instance I think its "enough"
to give a high-level overview of StandardCodec, but I can't help but think that explaining
some of the architecture will be useful for new developers.

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message