ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan V." <iveselovs...@gridgain.com>
Subject Re: Using HDFS as a secondary FS
Date Tue, 15 Dec 2015 09:10:41 GMT
Denis, good question.
Yes, there are several reasons.
1) setup-hadoop is suitable for Apache Hadoop distribution, but not for all
others (e.g. BigTop)
2) setup-hadoop rewrites global configs (core-site.xml, mapred-site.xml),
what prevents further cluster usage without Ignite.
3) setup-hadoop needs write permission to all the folders it writes files
to.
4) It is possible to provide all the required functionality without any
file modifications in the existing Hadoop cluster at all, see
https://issues.apache.org/jira/browse/IGNITE-483.

There were plans to remove "setup-hadoop", but that is not yet done.
In any way, I 100% agree that presence of several different versions of the
documentation is quite confusing and misleading.


On Mon, Dec 14, 2015 at 10:58 PM, Denis Magda <dmagda@gridgain.com> wrote:

> Ivan,
>
> Is there any reason why we don’t recommend using
> apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop
> Accelerator articles?
>
> With setup-hadoop.sh I was able to build a valid classpath, create
> symlinks to the accelerator's jars from hadoop’s libs folder automatically
> and started an Ignite node that uses HDFS as a secondary FS in less than 10
> minutes.
>
> I just followed the instructions from
> apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the
> readme.io <http://readme.io/> look much more complex for me, they don’t
> mention setup-hadoop.sh/bat at all making the end user to perform a
> manual setup.
>
> —
> Denis
>
> > On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <dsetrakyan@apache.org>
> wrote:
> >
> > On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <dmagda@gridgain.com>
> wrote:
> >
> >> Yes, this will be documented tomorrow. I want to go though all the steps
> >> by myself checking all other possible obstacles the user may face with.
> >>
> >
> > Thanks, Denis!
> >
> >
> >>
> >> —
> >> Denis
> >>
> >>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <dsetrakyan@apache.org
> >
> >> wrote:
> >>>
> >>> Ivan, I think this should be documented, no?
> >>>
> >>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <iveselovskiy@gridgain.com>
> >> wrote:
> >>>
> >>>> To enable just an IGFS persistence there is no need to use HDFS (this
> >>>> requires Hadoop dependency, requires configured HDFS cluster, etc.).
> >>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 ,
> >>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
> >>>> persistence upon local file system, and we already close to  the
> >> solution.
> >>>>
> >>>> Regarding the secondary Fs doc page (
> >>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I would
> >>>> suggest to add the following text there:
> >>>> ------------------------
> >>>> If Ignite node with secondary file system configured on a machine with
> >>>> Hadoop distribution, make sure Ignite is able to find appropriate
> Hadoop
> >>>> libraries: set HADOOP_HOME environment variable for the Ignite process
> >> if
> >>>> you're using Apache Hadoop distribution, or, if you use another
> >>>> distribution (HDP, Cloudera, BigTop, etc.) make sure
> /etc/default/hadoop
> >>>> file exists and has appropriate contents.
> >>>>
> >>>> If Ignite node with secondary file system configured on a machine
> >> without
> >>>> Hadoop distribution, you can manually add necessary Hadoop
> dependencies
> >> to
> >>>> Ignite node classpath: these are dependencies of groupId
> >>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently
> >> they
> >>>> are:
> >>>>
> >>>>  1. hadoop-annotations
> >>>>  2. hadoop-auth
> >>>>  3. hadoop-common
> >>>>  4. hadoop-hdfs
> >>>>  5. hadoop-mapreduce-client-common
> >>>>  6. hadoop-mapreduce-client-core
> >>>>
> >>>> ------------------------
> >>>>
> >>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
> >>>> valentin.kulichenko@gmail.com> wrote:
> >>>>
> >>>>> Guys,
> >>>>>
> >>>>> Why don't we include ignite-hadoop module in Fabric? This user simply
> >>>> wants
> >>>>> to configure HDFS as a secondary file system to ensure persistence.
> Not
> >>>>> having the opportunity to do this in Fabric looks weird to me. And
> >>>> actually
> >>>>> I don't think this is a use case for Hadoop Accelerator.
> >>>>>
> >>>>> -Val
> >>>>>
> >>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <dmagda@gridgain.com>
> >>>> wrote:
> >>>>>
> >>>>>> Hi Ivan,
> >>>>>>
> >>>>>> 1) Yes, I think that it makes sense to have the old versions
of the
> >>>> docs
> >>>>>> while an old version is still considered to be used by someone.
> >>>>>>
> >>>>>> 2) Absolutely, the time to add a corresponding article on the
> >>>> readme.io
> >>>>>> has come. It's not the first time I see the question related
to HDFS
> >>>> as a
> >>>>>> secondary FS.
> >>>>>> Before and now it's not clear for me what exact steps I should
> follow
> >>>> to
> >>>>>> enable such a configuration. Our current suggestions look like
a
> >>>> puzzle.
> >>>>>> I'll assemble the puzzle on my side and prepare the article.
Ivan if
> >>>> you
> >>>>>> don't mind I would reaching you out directly asking for any
> technical
> >>>>>> assistance if needed.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Denis
> >>>>>>
> >>>>>>
> >>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote:
> >>>>>>
> >>>>>>> Hi, Valentin,
> >>>>>>>
> >>>>>>> 1) first of all note that the author of the question uses
not the
> >>>> latest
> >>>>>>> doc page, namely
> >>>>>>>
> >> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
> >>>> .
> >>>>>>> This is version 1.0, while the latest is 1.5:
> >>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator.
Besides,
> it
> >>>>>>> appeared that some links from the latest doc version point
to 1.0
> doc
> >>>>>>> version. I fixed that in several places where I found that.
Do we
> >>>> really
> >>>>>>> need old doc versions (1.0 -1.4)?
> >>>>>>>
> >>>>>>> 2) our documentation (
> >>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system)
does
> >> not
> >>>>>>> provide any special setup instructions to configure HDFS
as
> secondary
> >>>>> file
> >>>>>>> system in Ignite. Our docs assume that if a user wants to
integrate
> >>>> with
> >>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction
(e.g.
> >>>>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop
> ).
> >>>> It
> >>>>>>> looks like the page
> >>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system
should
> >> be
> >>>>>>> more
> >>>>>>> clear regarding the required configuration steps (in fact,
setting
> up
> >>>>>>> HADOOP_HOME variable for Ignite node process).
> >>>>>>>
> >>>>>>> 3) Hadoop jars are correctly found by Ignite if the following
> >>>> conditions
> >>>>>>> are met:
> >>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
> >>>> edition).
> >>>>>>> (b) Either HADOOP_HOME environment variable is set up (for
Apache
> >>>> Hadoop
> >>>>>>> distribution), or file "/etc/default/hadoop" exists and
matches the
> >>>>> Hadoop
> >>>>>>> distribution used (BigTop, Cloudera, HDP, etc.)
> >>>>>>>
> >>>>>>> The exact mechanism of the Hadoop classpath composition
can be
> found
> >>>> in
> >>>>>>> files
> >>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh
> >>>>>>> IGNITE_HOME/bin/include/setenv.sh .
> >>>>>>>
> >>>>>>> The issue is discussed in
> >>>>>>> https://issues.apache.org/jira/browse/IGNITE-372
> >>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 .
> >>>>>>>
> >>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
> >>>>>>> valentin.kulichenko@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Igniters,
> >>>>>>>>
> >>>>>>>> I'm looking at the question on SO [1] and I'm a bit
confused.
> >>>>>>>>
> >>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator
and
> without
> >>>>>>>> Hadoop
> >>>>>>>> JARs, assuming that user will include them from the
Hadoop
> >>>> distribution
> >>>>>>>> he
> >>>>>>>> uses. It seems OK for me when accelerator is plugged
in to Hadoop
> to
> >>>>> run
> >>>>>>>> mapreduce jobs, but I can't figure out steps required
to configure
> >>>> HDFS
> >>>>>>>> as
> >>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be
on classpath?
> >> Is
> >>>>>>>> user
> >>>>>>>> supposed to add them manually?
> >>>>>>>>
> >>>>>>>> Can someone with more expertise in our Hadoop integration
clarify
> >>>>> this? I
> >>>>>>>> believe there is not enough documentation on this topic.
> >>>>>>>>
> >>>>>>>> BTW, any ideas why user gets exception for JobConf class
which is
> in
> >>>>>>>> 'mapred' package? Why map-reduce class is being used?
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
> >>>>>>>>
> >>>>>>>> -Val
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message