hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2045) Data persisted in NM should be versioned
Date Tue, 15 Jul 2014 15:40:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062212#comment-14062212

Jason Lowe commented on YARN-2045:

bq. I agree the concept is not quite the same but I tend to handle them both together as either
of change (protobuf schema or layout schema) will bring difficulty/risky for NMStateStoreService
to load old version of data.

I think lumping them together and handling them in the implementation-specific code is fine,
but if the implementation is handling all the details then why is it exposed in the interface?
 I think the most telling point is that in the proposed patch no common code actually uses
the interfaces that were added.  Each implementation does its own version setting, its own
compatibility check, and I assume its own marshaling in the future if necessary.  The interfaces
aren't called by common code.  Maybe I'm not seeing the future use case of these methods?
 I guess it could be useful for common code to do logging/reporting of the persisted/current
versions or maybe to do a very simplistic incompatibility check (e.g.: assume different major
numbers means incompatible), although arguably the implementation could simply log these numbers
as it initializes and is already doing an implementation-specific compatibility check.

However I'm particularly doubtful of the storeVersion method as it seems like the only way
to safely convert versions in the general sense is with implementation-specific code.  Using
the conversion pseudo-code above as an example, if we crash halfway through the conversion
of a series of objects then we have a mix of old and new data on the next restart but the
stored version number is still old (or vice-versa if we store the new version first then convert).
 In an implementation-specific approach it may be possible to make the conversion atomic,
e.g.: using a batch write for the entire conversion in leveldb.  Therefore it makes more sense
to me that an implementation should be responsible for deciding when and how to update the
persisted schema version.  I would expect implementations to do this sort of conversion during
initialization and potentially the old persisted version would never be seen since it would
already be converted.  Do you have an example where using the storeVersion method in the interface
via implementation-independent code would be more appropriate and therefore the storeVersion
method in the interface is necessary?

To summarize, I can see exposing the ability to get the persisted and current state store
versions in the interface for logging, etc.  However I don't see how implementation-independent
code can properly update the version via the interface.  We're lumping both interface and
implementation-specific schema changes in the same version number, and it isn't possible to
do an update of multiple store objects atomically via the current interface.

bq. Are you suggesting NMDBSchemaVersion to play as PBImpl directly to include raw protobuf
or something else?

Sort of a subset of what the PBImpl is doing.  I was thinking of having NMDBSchemaVersion
wrap the protobuf but in a read-only way (i.e.: no set methods, no builder stuff).  If one
wants to change the version number, create a new protobuf.  PBImpls tend to get into trouble
because they can be written, and it's simpler to treat the protobufs as immutable as they
were intended.  Another approach would be to simply have some static helper util methods that
take two protobufs to do the compatible checks, etc.  Although I don't think we can really
implement a useful isCompatibleTo check in implementaion-independent code since the version
numbers encodes implementation-specific schema information.

Anyway I didn't mean to drag out this change for too long.  I'm wondering about these interfaces
since I'm a strong believer that interfaces should be minimal and necessary, and I'm having
difficulty seeing how these interfaces are really going to be used.  However I'm probably
in the minority on these methods.  If people feel strongly that these interfaces are necessary
and useful then go ahead and add them.  It seems to me that these interfaces will either never
be called or only called for trivial reasons (e.g.: logging).  However I don't think having
them is going to break anything or be an unreasonable burden on an implementation, rather
just extra baggage that state store implementations have to expose.  As for the PBImpl, it's
mostly a nit.  If you really would rather keep it in I guess that's fine.  We should be able
to remove it later if we realize we don't have a use for it.  The main change I think has
to be made is the leveldb schema check should handle the original method for storing the schema.
 Two ways to handle that are either explicitly check for the "1.0" string before trying to
parse the value as a protobuf or store the new version type under a different key and check
for the old schema key if the new schema key is missing.

> Data persisted in NM should be versioned
> ----------------------------------------
>                 Key: YARN-2045
>                 URL: https://issues.apache.org/jira/browse/YARN-2045
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.4.1
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: YARN-2045-v2.patch, YARN-2045.patch
> As a split task from YARN-667, we want to add version info to NM related data, include:
> - NodeManager local LevelDB state
> - NodeManager directory structure

This message was sent by Atlassian JIRA

View raw message