lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <>
Subject Re: Attributes, DocConsumer, Flexible Indexing, etc.
Date Wed, 05 Aug 2009 20:35:10 GMT
On 8/5/09 1:07 PM, Grant Ingersoll wrote:
> Hmmm, OK.
> Random, somewhat uneducated thought:  Why not just define the codecs 
> to create byte arrays?  Then we can use the existing payload 
> capability much like I do with the DelimitedPayloadTokenFilter.   We'd 
> probably have to make sure this still worked with Similarity, but it 
> seems like it could.  Thinking on this some more, seems like this 
> could work already with a a AttributePayloadEncoder or something like 
> an AttributeToPayloadTokenFilter (I know, horrible name).  Then, on 
> the Query side, the AttributeTermQuery is just a glorified 
> BoostingTermQuery with some callback hooks for dealing with the 
> Attribute (but maybe that isn't even needed), either that or we just 
> provide helper methods to the Similarity class so that people can 
> easily decode the byte array into an Attribute.  In fact, maybe all 
> that needs to happen is the Attributes need to define encode/decode 
> methods that (de)serialize a byte array.
> Seems like this approach would require very little in the way of 
> changes to Lucene, but I admit it isn't fully baked in my mind just 
> yet.  It also has the nice benefit that all the work we did on 
> Payloads isn't wasted.
> This is resonating more and more with me.  What do you think?

Well I think this would be a nice way of using the payloads better.

However, the idea behind flexible indexing is that you can customize the 
on-disk encoding in a way that it is as efficient as it can be for your 
particular use case. E.g. for payloads we currently have to encode the 
length. An application might not have to do that if it knows exactly 
what is stored.
Then there's only the Payload API that returns you a byte array. It 
basically copies the contents of the IndexInput (usually a 
BufferedIndexInput, which means array copy from the byte buffer to the 
payload byte array). If the application knows exactly what is stored it 
can read/decode it more efficiently.

The latter inefficiency we could solve by improving the payloads API: it 
could return an IndexInput instead of the byte array and the caller 
could consume it more efficient.

So I agree that we could use Attributes to make the payloads feature 
better usable, but I don't think it will be a replacement for flexible 


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message