lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Updated: (LUCENE-2154) Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers
Date Thu, 11 Feb 2010 10:04:28 GMT


Uwe Schindler updated LUCENE-2154:

    Attachment: LUCENE-2154.patch

Here is a first patch about cglib-generated proxy attributes.

In IRC we found out yesterday, that the proposed idea to share the attributes accross all
Multi*Enums would result in problems as the call to next() on any sub-enum would overwrite
the contents of the attributes of the previous sub-enum which would make TermsEnum not working
(because e.g. TermsEnum looks forward by calling next() an all sub-enums and choosing the
lowest term to return - after calling each enums next() the attributes of the first enums
cannot be restored without captureState & co, as overwritten by the next() call to the
last enum).

This patch needs cglib-nodep-2.2.jar put into the lib-folder of the checkout [].

It contains a test and that shows how the usage is. The central part is cglib's Enhancer that
creates a dynamic class extending ProxyAttributeImpl (which defines the general AttributeImpl
methods delegating to the delegate) and implementing the requested Attribute interface using
a MethodInterceptor.

Please note: This uses no reflection (only during in-memory class file creation, which is
only run one time on "loading" the proxy class). The proxy implements MethodInterceptor and
uses the fast MethodProxy class (which is also generated by cglib for each proxied method,
too) and can invoke the delegated method directly (without reflection) on the delegate.

The test verifies everything works and also compares speed by using a TermAttribute natively
and proxied. The speed is lower (which is not caused by reflection, but by the MethodInterceptor
creating an array of parameters and boxing/unboxing native parameters into the Object[]),
but for the testcase I have seen about only  50% more time needed.

The generated classes are cached and reused (like DEFAULT_ATTRIBUTE_FACTORY does).

To get maximum speed and no external libraries, the code implemented by Enhancer can be rewritten
natively using the Apache Harmony java.lang.reflect.Proxy implementation source code as basis.
The hardest part in generating bytecode is the ConstantPool in class files. But as the proxy
methods are simply delegating and no magic like boxing/unboxing is needed, the generated bytecode
is rather simple.

One other use-case for these proxies is AppendingTokenStream, which is not possible since
3.0 without captureState (in old TS API it was possible, because you could reuse the same
TokenInstance even over the appended streams). In the new TS api, the appending stream must
have a "view" on the attributes of the current consuming sub-stream.

> Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers
> ---------------------------------------------------------------------------------------
>                 Key: LUCENE-2154
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: Flex Branch
>            Reporter: Michael McCandless
>             Fix For: Flex Branch
>         Attachments: LUCENE-2154.patch
> The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum levels, for
a codec to set custom attrs.
> But, it's currently broken for Dir/MultiReader, which must somehow share attrs across
all the sub-readers.  Somehow we must make a single attr source, and tell each sub-reader's
enum to use that instead of creating its own.  Hopefully Uwe can work some magic here :)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message