hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré ...@nanthrax.net>
Subject Re: OSGi and classloaders
Date Mon, 09 Jul 2012 14:50:40 GMT
Hi Bobby,

Guillaume and I are working on trunk. So it makes sense to focus on 
trunk for this kind of refactoring. We are working on a fork branch on 
github. We can choose when merge our changes to trunk (or a dedicated 
branch).

Regards
JB

On 07/09/2012 04:37 PM, Robert Evans wrote:
> Guillaume,
>
> I am not super familiar with OSGi.  I have used it a little in the past,
> but that was 5+ years ago.  I am in favor of something that will fix the
> CLASSPATH problems that we currently have and would allow for CLASSPATH
> isolation between Hadoop itself and the applications that use Hadoop.  If
> OSGi can do this cleanly then I am +1 for moving to OSGi.
>
> However, we are trying to maintain binary compatibility within major
> version numbers, in preparation for rolling upgrades.  Many of the things
> you have suggested like moving classes from one package to another, and
> doing some serious rework to Configuration will break not only binary
> compatibility but also API compatibility.
>
> If we do go this rout, just be aware that it is most likely something that
> would have to force a major version bump, which right now means trunk (the
> 3.0 line).
>
> --Bobby Evans
>
> On 7/9/12 8:24 AM, "Guillaume Nodet" <gnodet@gmail.com> wrote:
>
>> I'm working with Jean-Baptiste to make hadoop work in OSGi.
>> OSGi works with classloader in a very specific way which leads to several
>> problems with hadoop.
>>
>> Let me quickly explain how OSGi works.  In OSGi, you deploy bundles, which
>> are jars with additional OSGi metadata.  This metadata is used by the OSGi
>> framework to create a classloader for the bundle.  However, the
>> classloaders are not organized in a tree like in a JEE environment, but
>> rather in some kind of graph, where each classloader has limited
>> visibility
>> and limited exposure.  This is controlled by at the package level by
>> specifying which packages are exported and which packages are imported by
>> a
>> given bundle.   This is mainly two consequences:
>>   * OSGi does not supports well split-packages, where the same package is
>> exported by two different bundles
>>   * a classloader does not have visibility on everything as in a usual
>> flat
>> classloader environment or even JEE-like env
>>
>> The first problem arise for example with the org.apache.hadoop.fs package
>> which is split across hadoop-common and hadoop-hdfs jars (which defines
>> the
>> Hdfs class).  There may be other cases, but I haven't hit them yet.  To
>> solve this problem, it'd be better if such classes were moved into a
>> different package.
>>
>> The second problem is much more complicated.   I think most of the
>> classloading is done from Configuration.  However, Configuration has an
>> internal classloader which is set by the constructor to the thread context
>> classloader (defaulting to the Configuration class' classloader) and new
>> Configuration objects are created everywhere in the code.
>> In addition, creating new Configuration objects force the parsing of the
>> configuration files several times.
>> Also in OSGi, Configuration is better done through the standard OSGi
>> ConfigurationAdmin service, so it would be nice to integrate the
>> configuration into ConfigAdmin when running in OSGi.
>> For the above reasons, I'd like to know what would you think of
>> transforming the Configuration object into a real singleton, or at least
>> replacing the "new Configuration()" call spread everywhere with the access
>> to a singleton Configuration.getInstance().
>> This would allow  the hadoop osgi layer to manage the Configuration in a
>> more osgi friendly way, allowing the use of a specific subclass which
>> could
>> better manage the class loading in an OSGi environment and integrate with
>> ConfigAdmin.  This may also remove the need for keeping a registry of
>> existing Configuration and having to update them when a default resource
>> if
>> added for example.
>>
>> Some of the above problems have been addressed in some way in HADOOP-7977,
>> but the fixes I've been working on were more related to hadoop 1.0.x
>> branch, and are slightly unapplicable to trunk.
>>
>> One last point: the two above problems are mainly due to the fact that
>> I've
>> been assuming that individual hadoop jars are transformed into native
>> bundles.  This would go away if we'd have a single bundle containing all
>> the individual jars (as it was with hadoop-core-1.0.x, but having more
>> fine
>> grained jars is better imho.
>>
>> Thoughts welcomed.
>>
>> --
>> ------------------------
>> Guillaume Nodet
>> ------------------------
>> Blog: http://gnodet.blogspot.com/
>> ------------------------
>> FuseSource, Integration everywhere
>> http://fusesource.com
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



Mime
View raw message