lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johannes.Lichtenberger" <>
Subject Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?
Date Fri, 30 Nov 2012 15:15:32 GMT
On 11/28/2012 01:11 AM, Michael McCandless wrote:
> Flexible indexing is the ability to make your own codec, which
> controls the reading and writing of all index parts (postings, stored
> fields, term vectors, deleted docs, etc.).
> So for example if you want to store some postings as a bit set instead
> of the block format that's the default coming up in 4.1, that's easy
> to do.
> But what is less easy (as I described below) is changing what is
> actually stored in the postings, eg adding a new per-position
> attribute.
> The original goal was to allow arbitrary attributes beyond the known
> docs/freqs/positions/offsets that Lucene supports today, so that you
> could easily make new application-dependent per-term, per-doc,
> per-position things, pull them from the analyzer, save them to the
> index, and access them from an IndexReader / query, but while some
> APIs do expose this, it's not very well explored yet (eg, you'd have
> to make a custom indexing chain to get the attributes "through"
> IndexWriter down to your codec).  It would be great to make progress
> making this easier, so ideas are very welcome :)

Regarding my questin/thread, is it also possible to change the backend 
system? I'd like to use Lucene for a versioned DBMS, thus I would need 
the ability to serialize/deserialize the bytes in my backend whereas 
keys/values are stored in pages (for instance in an upcoming B+-tree, or 
in  simple "unordered" pages via a record-ID/record mapping). But as no 
one suggested anything as of now and I've also asked a year ago or so, 
after implementing the B+-tree I will probably have to implement my own 
datastructure and parser/tokenizer/stemmer... :-(

kind regards,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message