ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Ignite File System (re)design.
Date Mon, 02 Mar 2015 13:12:40 GMT
HI all,

We spend some time on discussions about file system and Hadoop APIs. There
were two possible ways to improve current non-obvious API.
First idea was to leave API more or less the same with only some cosmetic
changes, mainly class names.
Second idea was to remove all secondary file system configuration
parameters from IgfsConfiguration and move it to Hadoop module. Then IGFS
could be wired up with Hadoop secondary file system with help of some
private interface which are not exposed to users.

I think the first solution is better because currently secondary file
system in IGFS is a kind of extension point. User is free to implement his
own secondary storage and use it in pretty the same way as store is used in
cache. I do not see any sensible reasons why we should remove this
extension point and hide it in Hadoop module. Therefore, I designed new API
using the first approach and the draft put into the branch ignite-386.
Please feel free to review and comment it.

I'll also briefly go through the new design here:

Core module:
1) o.a.i.IgniteFileSystem - user interface to work with our native file
system. Obtained using Ignite.fileSystem() method
Based on "IgniteFs" and "Igfs" interfaces in current implementation

2) o.a.i.filesystem.SecondaryFileSystem - API for creating secondary file
systems for IGFS.
Based on "Igfs" interface in current implementation.

Note that there is no more direct link between IgniteFileSystem and
SecondaryFileSystem, as these are completely different entities.

3) o.a.i.configuration.FileSystemConfiguration - configuration bean for
IgniteFileSystem. It has setter
"setSecondaryFileSystem(SecondaryFileSystem)".

Hadoop module:
1) There are 4 map-reduce classes under o.a.i.hadoop.mapreduce package.
Their packages reflect corresponding packages in Hadoop API. E.g.:
org.apache.ignite.[hadoop.mapreduce.protocol.IgniteHadoopClientProtocol]
implements org.apache.[hadoop.mapreduce.protocol.ClientProtocol].

2) Two file system implementations named "IgniteHadoopFileSystem" for v1
and v2 Hadoops.

3) IgniteHadoopSecondaryFileSystem - implementation of SecondaryFileSystem
from core module, which is capable of delegating native IGFS calls to
underlying Hadoop FileSystem.
It is named "IgfsHadoopFileSystemWrapper" in current implementation.

Let me give you an example of how user is going to configure it now.

1) Ignite configuration:
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
    <property name="fileSystemConfiguration">
        <list>
            <bean
class="org.apache.ignite.configuration.FileSystemConfiguration">
                <!-- Delegate to real HDFS. -->
                <property name="secondaryFileSystem">
                    <bean
class="├Ârg.apache.ignite.hadoop.fs.IgniteHadoopSecondaryFileSystem">
                        <constructor-arg value="hdfs://192.168.1.23"/>
                    </bean>
                </property>
            </bean>
        </list>
    </property>
</bean>

2) core-site.xml:
<configuration>
    <property>
        <name>fs.default.name</name>
        <value>igfs:///</value>
    </property>
    <property>
        <name>fs.igfs.impl</name>
        <value>org.apache.ignite.igfs.hadoop.v1.IgfsHadoopFileSystem</value>
    </property>
    <property>
        <name>fs.AbstractFileSystem.igfs.impl</name>
        <value>org.apache.ignite.igfs.hadoop.v2.IgfsHadoopFileSystem</value>
    </property>
</configuration>

Seems pretty clear and consistent to me.

Thoughts?

Vladimir.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message