Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2D7009A5D for ; Mon, 9 Jul 2012 13:25:13 +0000 (UTC) Received: (qmail 83065 invoked by uid 500); 9 Jul 2012 13:25:12 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 82904 invoked by uid 500); 9 Jul 2012 13:25:11 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 82875 invoked by uid 99); 9 Jul 2012 13:25:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jul 2012 13:25:10 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gnodet@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jul 2012 13:25:06 +0000 Received: by qcsc21 with SMTP id c21so7299063qcs.35 for ; Mon, 09 Jul 2012 06:24:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=UabGhu4OjVeq9Eo9yXYc/rtm5+jloXf0MZ2GeFTNgXc=; b=mj7hK0rmp/QCy1qWshrXN39AmyeI/gp03h64b1VAH5e4KWVCHuW3T0CmgBvJRTLr+k q3YYdPjCRG4PrXmJUIN2DjggDfxPdUZRxmKNLR3n3ky2nKt15sN5g3Y77R6sGtsGhGcU Zndx8pQHAW6geIg0pcsss49lZk8wuvhAspmrBN/qrO3/Ys12L6hXZ/RwdBWYW5qVl2sv 9qI35AhParqeb1+MoQQ5y+awYdGPz8gaStRaxr4N/9tLTEWVsXP22ewdI7BsRtioSv9j dBUx1k69BMvEJQnf4us2Kr+Wx2DJW9zmGUfPxXUhk3uAL3bQ2UCy8ICxQVlM3PX3JLUN zpNw== MIME-Version: 1.0 Received: by 10.224.101.73 with SMTP id b9mr43835977qao.42.1341840285178; Mon, 09 Jul 2012 06:24:45 -0700 (PDT) Received: by 10.229.95.212 with HTTP; Mon, 9 Jul 2012 06:24:45 -0700 (PDT) Date: Mon, 9 Jul 2012 15:24:45 +0200 Message-ID: Subject: OSGi and classloaders From: Guillaume Nodet To: hdfs-dev@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf3074b7c28b1ce804c4658984 X-Virus-Checked: Checked by ClamAV on apache.org --20cf3074b7c28b1ce804c4658984 Content-Type: text/plain; charset=ISO-8859-1 I'm working with Jean-Baptiste to make hadoop work in OSGi. OSGi works with classloader in a very specific way which leads to several problems with hadoop. Let me quickly explain how OSGi works. In OSGi, you deploy bundles, which are jars with additional OSGi metadata. This metadata is used by the OSGi framework to create a classloader for the bundle. However, the classloaders are not organized in a tree like in a JEE environment, but rather in some kind of graph, where each classloader has limited visibility and limited exposure. This is controlled by at the package level by specifying which packages are exported and which packages are imported by a given bundle. This is mainly two consequences: * OSGi does not supports well split-packages, where the same package is exported by two different bundles * a classloader does not have visibility on everything as in a usual flat classloader environment or even JEE-like env The first problem arise for example with the org.apache.hadoop.fs package which is split across hadoop-common and hadoop-hdfs jars (which defines the Hdfs class). There may be other cases, but I haven't hit them yet. To solve this problem, it'd be better if such classes were moved into a different package. The second problem is much more complicated. I think most of the classloading is done from Configuration. However, Configuration has an internal classloader which is set by the constructor to the thread context classloader (defaulting to the Configuration class' classloader) and new Configuration objects are created everywhere in the code. In addition, creating new Configuration objects force the parsing of the configuration files several times. Also in OSGi, Configuration is better done through the standard OSGi ConfigurationAdmin service, so it would be nice to integrate the configuration into ConfigAdmin when running in OSGi. For the above reasons, I'd like to know what would you think of transforming the Configuration object into a real singleton, or at least replacing the "new Configuration()" call spread everywhere with the access to a singleton Configuration.getInstance(). This would allow the hadoop osgi layer to manage the Configuration in a more osgi friendly way, allowing the use of a specific subclass which could better manage the class loading in an OSGi environment and integrate with ConfigAdmin. This may also remove the need for keeping a registry of existing Configuration and having to update them when a default resource if added for example. Some of the above problems have been addressed in some way in HADOOP-7977, but the fixes I've been working on were more related to hadoop 1.0.x branch, and are slightly unapplicable to trunk. One last point: the two above problems are mainly due to the fact that I've been assuming that individual hadoop jars are transformed into native bundles. This would go away if we'd have a single bundle containing all the individual jars (as it was with hadoop-core-1.0.x, but having more fine grained jars is better imho. Thoughts welcomed. -- ------------------------ Guillaume Nodet ------------------------ Blog: http://gnodet.blogspot.com/ ------------------------ FuseSource, Integration everywhere http://fusesource.com --20cf3074b7c28b1ce804c4658984--