hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2045) Data persisted in NM should be versioned
Date Tue, 15 Jul 2014 02:22:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061559#comment-14061559
] 

Junping Du commented on YARN-2045:
----------------------------------

[~jlowe], thanks for all good comments. Please see my reply below:
bq.  for tracking protobuf schema changes for any protobufs passed via the NMStateStoreService
interface directly. That's not quite the same concept as the layout of the state store itself
(i.e.: implementation-specific schema). For example, if leveldb decided to rework the way
it uses keys in the database that would be a schema change not reflected in this version,
correct?
I agree the concept is not quite the same but I tend to handle them both together as either
of change (protobuf schema or layout schema) will bring difficulty/risky for NMStateStoreService
to load old version of data. If incompatible changes happen at protobuf schema, it cannot
deserialize object from loaded data, or it cannot search/load data correctly if changes happens
at layout. However, these "incompatible" changes can be defined with flexibility in future.
If we still want to support backward compatibility for some changes in layout, we can have
some extra code to handle special version layout - like what HDFS was doing and we don't have
to increase the major number of version. 

bq.  I just found it odd that the state store implementation code is checking this value yet
it's exposed in a state store interface which is not implementation specific. That's what
felt wrong to me.
Are you talking about the interfaces (loadVersion/storeVersion/getCurrentVersion) exposed
in NMStateStoreService? I think these interfaces are already implemented specifically in each
state store implementations and can be leveraged by version check specifically. checkVersion()
is not exposed in interface as this method is only necessary for some implementation of state
store. Do I miss something here?

bq.  For this kind of schema version, the code to marshal data between compatible versions
should be in a common place, not in each state store implementation, and either all state
stores end up supporting the compatible versions or they don't.
IMO, the code to marshal data between compatible versions (if necessary) can be implementation-specific.
It can happen in state store initialize stage with detecting version and doing translating
work. Some pseudo code example to do this in future I can imagine is like below:
{code}
version = loadVersion();
if (version == specificCompatibleOldVersion) {
  object[] o1 = loadO1WithOldLogic();
  object[] o2 = loadO2WithOldLogic()
  ...
}
persistObjectsBack(o1, o2, ...).
storeNewVersionNumber();
{code}
This may not be necessary for protobuf object compatibility as old object should have reasonable
default value after deserialize in new code. But could be necessary for layout changes that
we want to support backward compatibility. Thoughts?

bq. We could just have NMDBSchemaVersion class wrap the raw protobuf and have the same easy-to-use
interface without all the extra PBImpl code. And as a bonus, NMDBSchemaVersion's hashCode
and equals methods can delegate to the raw protobuf methods to remove even more code.
I can understand your point here. Are you suggesting NMDBSchemaVersion to play as PBImpl directly
to include raw protobuf or something else? Do we have example like this before?

> Data persisted in NM should be versioned
> ----------------------------------------
>
>                 Key: YARN-2045
>                 URL: https://issues.apache.org/jira/browse/YARN-2045
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.4.1
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: YARN-2045-v2.patch, YARN-2045.patch
>
>
> As a split task from YARN-667, we want to add version info to NM related data, include:
> - NodeManager local LevelDB state
> - NodeManager directory structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message