lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes
Date Wed, 03 Dec 2008 11:01:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652740#action_12652740
] 

Michael McCandless commented on LUCENE-1473:
--------------------------------------------


{quote}
> At the risk of pissing off the Lucene powerhouse, I feel I have to express some candor.
I am growing more and more frustrated with the lack of the open source nature of this project
and its unwillingness to work with the developer community. This is a rather trivial issue,
and it is taking 7 back-and-forth's to reiterate some standard Java behavior that has been
around for years.
{quote}
Whoa!  I'm sorry if my questions are giving this impression.  I don't
intend to.

But I do have real questions, still, because I don't think
Serialization is actually so simple.  I too was surprised on looking
at what started as a simple patch yet on digging into it uncovered
some real challenges.

{quote}
>Use case: deploying lucene in a distributed environment, we have a broker/server architecture.
(standard stuff), we want roll out search servers with lucene 2.4 instance by instance. The
problem is that the broker is sending a Query object to the searcher via java serialization
at the server level, and the broker is running 2.3. And because of specifically this problem,
2.3 brokers cannot to talk to 2.4 search servers even when the Query object was not changed.
{quote}
OK that is a great use case -- thanks.  That helps focus the many
questions here.

{quote}
> It is a known good java programming practice to include a suid to the class (as a static
variable) when the object declares itself to be Serializable.
{quote}

But that alone gives a too-fragile back-compat solution because it's
too coarse.  If we add field X to a class implementing Serializable,
and must bump the SUID, that's a hard break on back compat.  So really
we need to override read/writeObject() or read/writeExternal() to do
our own versioning.

Consider this actual example: RangeQuery, in 2.9, now separately
stores "boolean includeLower" and "boolean includeUpper".  In versions
<= 2.4, it only stores "boolean inclusive".  This means we can't rely
on the JVM's default versioning for serialization.

{quote}
> The serialVersionUID (suid) is a long because it is a java thing.
{quote}

But, that's only if you rely on the JVM's default serialization.  If
we implement our own (overriding read/writeObject or
read/writeExtenral) we don't have to use "long SUID".

{quote}
> The problem was two different people did the release with different compilers.
{quote}

I think it's more likely the addition of a new ctor to Term (that
takes only String field), that changed the SUID.

{quote}
> If it is not meant to be serialized, why did it implement Serializable.
{quote}

Because there are two different things it can "mean" when a class
implements Serializable, and I think that's the core
disconnect/challenge to this issue.

The first meaning (let's call it "live serialization") is: "within the
same version of Lucene you can serialize/deserialize this object".

The second meaning (let's call it "long-term persistence") is: "you
can serialize this object in version X of Lucene and later deserialize
it using a newer version Y of Lucene".

Lucene, today, only guarantees "live serialization", and that's the
intention when "implements Serializable" is added to a class.

But, what's now being asked for (expected) with this issue is
"long-term persistence", which is really a very different beast and a
much taller order.  With it comes a number of challenges, that warrant
scrutiny:

  * What's our back-compat policy for "long-term persistence"?

  * The storage protocol must have a version header, so future changes
    can switch on that and decode older formats.

  * We need strong test cases that deserialize older versions of these
    serialized classes so we don't accidentally break it.

  * We should look carefully at the protocol and not waste bytes if we
    can (1 byte vs 8 byte version header).

These issues are the same issues we face with the index file format,
because that is also long-term persistence.


> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement
Externalizable.  This will make Serialization faster due to no reflection required and maintain
backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message