reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergiy Matusevych <sergiy.matusev...@gmail.com>
Subject Re: reef integration with spark
Date Fri, 21 Apr 2017 05:27:41 GMT
Hi Saikat,

This is great! REEF-on-REEF is actually the major step towards
REEF-on-Spark, and it would be great to make it work ASAP. In general,
pretty much all work for Spark integration is on REEF side; I really hope
we won't have to change Spark for our needs. One of our current blockers is
this issue: https://issues.apache.org/jira/browse/REEF-1782 - it would be
fantastic if you could help me with it.

Another problem you can look at is at the reef-spark project: when I run
the app on YARN, the evaluator fails to find REEFLauncher class at the
start of the evaluator. Most likely, we screw up the classpath or the
resources, but I don't know where. One suspicious thing is the following.
REEF packages its jars for the evaluator into hdfs://user/hadoop/global.jar
resource. this is fine; however, inside that global jar we use the full
path to the packaged jars, e.g.

$ jar tvf global.jar
 15082 Wed Apr 19 19:03:50 PDT 2017
mnt/data/0/local/nm/usercache/hadoop/filecache/18/reef-driver-on-spark_2.11-1.01.jar
16963704 Wed Apr 19 19:03:50 PDT 2017
mnt/data/0/local/nm/usercache/hadoop/filecache/15/reef-examples-0.16.0-SNAPSHOT-shaded.jar

-- note that mnt/data/0/... path. I am not sure if it OK to have such a
long path (that is also specific to the driver host).


That'd be great if you could investigate either of these two issues.

Thanks a lot and please don't hesitate to ping me if you have any
questions! I'd be also happy to meet you in person to buy you a coffee and
give you a guided tour over REEF codebase.

Cheers,
Sergiy.




On Thu, Apr 20, 2017 at 1:27 PM, Saikat Kanjilal <sxk1969@gmail.com> wrote:

> Sergiy,
> I am finally able to come up for air as I've finished quite a bit of my
> spark transform work and start working on this, before I do, should I focus
> on reef on reef or reef on spark, where do you need the most help?
> Thanks
>
> On Tue, Apr 11, 2017 at 1:49 PM, Sergiy Matusevych <
> sergiy.matusevych@gmail.com> wrote:
>
> > Hi Sakiat,
> >
> > How's everything? Were you able to run Spark+REEF app on YARN? I've just
> > added a call to ls -l on the Driver to see where REEF jars are. The log
> > looks like this:
> >
> > 17/04/11 15:55:35 INFO cisl.ReefOnSpark$: Job submitted:
> > application_1491940426172_0002
> > 17/04/11 15:55:35 INFO util.REEFVersion: REEF Version: 0.16.0-SNAPSHOT
> > 17/04/11 15:55:36 ERROR yarn.YarnClasspathProvider:
> > YarnConfiguration.YARN_APPLICATION_CLASSPATH is empty. This indicates a
> > broken cluster configuration.
> > 17/04/11 15:55:36 INFO cisl.ReefOnSpark$: ls:
> > .:
> > total 68
> > drwx--x---. 4 hadoop hadoop 4096 Apr 11 15:55 ./
> > drwx--x---. 6 hadoop hadoop 4096 Apr 11 15:55 ../
> > lrwxrwxrwx. 1 hadoop hadoop   85 Apr 11 15:55 __app__.jar ->
> > /mnt/data/0/local/nm/usercache/hadoop/filecache/13/
> > reef-driver-on-spark_2.11-1.01.jar
> > -rw-r--r--. 1 hadoop hadoop   69 Apr 11 15:55 container_tokens
> > -rw-r--r--. 1 hadoop hadoop   12 Apr 11 15:55 .container_tokens.crc
> > -rwx------. 1 hadoop hadoop  635 Apr 11 15:55
> > default_container_executor_session.sh
> > -rw-r--r--. 1 hadoop hadoop   16 Apr 11 15:55
> > .default_container_executor_session.sh.crc
> > -rwx------. 1 hadoop hadoop  689 Apr 11 15:55
> default_container_executor.sh
> > -rw-r--r--. 1 hadoop hadoop   16 Apr 11 15:55
> > .default_container_executor.sh.crc
> > -rwx------. 1 hadoop hadoop 3728 Apr 11 15:55 launch_container.sh
> > -rw-r--r--. 1 hadoop hadoop   40 Apr 11 15:55 .launch_container.sh.crc
> > drwxrwxr-x. 4 hadoop hadoop 4096 Apr 11 15:55 reef/
> > lrwxrwxrwx. 1 hadoop hadoop   91 Apr 11 15:55
> > reef-examples-0.16.0-SNAPSHOT-shaded.jar ->
> > /mnt/data/0/local/nm/usercache/hadoop/filecache/10/
> > reef-examples-0.16.0-SNAPSHOT-shaded.jar
> > lrwxrwxrwx. 1 hadoop hadoop   73 Apr 11 15:55 scala-arm_2.11-1.4.jar ->
> > /mnt/data/0/local/nm/usercache/hadoop/filecache/14/
> scala-arm_2.11-1.4.jar
> > lrwxrwxrwx. 1 hadoop hadoop   69 Apr 11 15:55 __spark_conf__ ->
> > /mnt/data/0/local/nm/usercache/hadoop/filecache/12/__spark_conf__.zip
> > lrwxrwxrwx. 1 hadoop hadoop   88 Apr 11 15:55 __spark_libs__ ->
> > /mnt/data/0/local/nm/usercache/hadoop/filecache/11/__spark_libs__
> > 2905924713127025597.zip
> > drwx--x---. 2 hadoop hadoop 4096 Apr 11 15:55 tmp/
> >
> > ./reef:
> > total 16
> > drwxrwxr-x. 4 hadoop hadoop 4096 Apr 11 15:55 ./
> > drwx--x---. 4 hadoop hadoop 4096 Apr 11 15:55 ../
> > drwxrwxr-x. 2 hadoop hadoop 4096 Apr 11 15:55 global/
> > drwxrwxr-x. 2 hadoop hadoop 4096 Apr 11 15:55 temp/
> >
> > ./reef/global:
> > total 12
> > drwxrwxr-x. 2 hadoop hadoop 4096 Apr 11 15:55 ./
> > drwxrwxr-x. 4 hadoop hadoop 4096 Apr 11 15:55 ../
> > lrwxrwxrwx. 1 hadoop hadoop  128 Apr 11 15:55 __app__.jar ->
> > /mnt/data/0/local/nm/usercache/hadoop/appcache/
> application_1491940426172_
> > 0001/container_1491940426172_0001_01_000001/__app__.jar
> >
> > ./reef/temp:
> > total 8
> > drwxrwxr-x. 2 hadoop hadoop 4096 Apr 11 15:55 ./
> > drwxrwxr-x. 4 hadoop hadoop 4096 Apr 11 15:55 ../
> >
> > ./tmp:
> > total 8
> > drwx--x---. 2 hadoop hadoop 4096 Apr 11 15:55 ./
> > drwx--x---. 4 hadoop hadoop 4096 Apr 11 15:55 ../
> >
> >
> > Maybe that can help you to debug the issue? I am looking at it too now.
> > Please feel free to ping me if you have questions!
> >
> > Cheers,
> > Sergiy.
> >
> >
> > On Fri, Apr 7, 2017 at 2:05 PM, Sergiy Matusevych <
> > sergiy.matusevych@gmail.com> wrote:
> >
> > > Hi Saikat,
> > >
> > > I've sent you an invite to reef-spark repo. I gave you write rights to
> > it,
> > > so you are welcome to create branches (prefixed with your id_) and
> issue
> > > pull requests into master - just make sure you don't clone it into the
> > > public.
> > >
> > > To build it, first do mvn install on REEF branch
> https://github.com/motu
> > > s/reef/tree/sergiym_uam_debug, and then run sbt package in reef-spark
> > > master.
> > > Note that you need YARN 2.7.3 or newer to make Unmanaged AM work.
> > >
> > > For starters, you can try to figure our where Spark copies its jars to
> on
> > > the AM host - that's the reason reef-spark does not work now. That is,
> > when
> > > I run
> > >
> > > ../spark/bin/spark-submit --master yarn --deploy-mode cluster --class
> > > com.microsoft.cisl.ReefOnSpark --jars ../reef/lang/java/reef-example
> > > s/target/reef-examples-0.16.0-SNAPSHOT-shaded.jar,scala-arm_
> 2.11-1.4.jar
> > > target/scala-2.11/reef-driver-on-spark_2.11-1.01.jar
> > >
> > > REEF (unmanaged) driver fails to pick the right JAR file to copy to the
> > > evaluators. That'd be great if you could fix that problem.
> > >
> > > Meanwhile, I'll keep working on REEF-on-REEF and other things that
> > require
> > > deeper dive into REEF internals.
> > >
> > > Cheers,
> > > Sergiy.
> > >
> > > On Fri, Apr 7, 2017 at 12:49 PM, Saikat Kanjilal <sxk1969@gmail.com>
> > > wrote:
> > >
> > > I would love to work on all three but number three is most interesting
> to
> > >> me as I'm knee deep into spark and writing a scala-spark-command line
> > app
> > >> to build an ml repo in azure.  Let me know your thoughts on next
> steps,
> > my
> > >> github id is skanjila
> > >>
> > >> On Fri, Apr 7, 2017 at 12:40 PM, Sergiy Matusevych <
> > >> sergiy.matusevych@gmail.com> wrote:
> > >>
> > >> > Hi Saikat,
> > >> >
> > >> > Sorry I did not see your message earlier. Of course, you are most
> > >> welcome
> > >> > to participate! Any help will be deeply appreciated.
> > >> >
> > >> > There are currently three lines of work:
> > >> >
> > >> > 1) REEF as a library, https://issues.apache.org/
> jira/browse/REEF-1561
> > >> > When running on top of Spark, REEF application has co clean up its
> > >> > resources on exit before handing control back to Spark. We still
> have
> > >> some
> > >> > rogue threads not closed at the end of the REEF+YARN job, and we
> need
> > to
> > >> > fix that. Here's more detailed description of the problem:
> > >> > https://issues.apache.org/jira/browse/REEF-1729?focusedComme
> > >> ntId=15902195
> > >> >
> > >> > 2) REEF on REEF, https://issues.apache.org/jira/browse/REEF-1667
> > >> > We want to run a REEF job on YARN and then launch another REEF
> driver
> > as
> > >> > Unmanaged AM. That is, we want to run *two* REEF drivers in one YARN
> > >> > container. That example is in master and works for local runtime,
> but
> > >> not
> > >> > on YARN. I am currently working on it in a separate branch, at
> > >> > https://github.com/motus/reef/tree/sergiym_uam_debug
> > >> >
> > >> > 3) REEF on Spark is almost identical to REEF on REEF, except the
> first
> > >> > (host) driver is Spark and the second one (unmanaged AM) is REEF.
> That
> > >> code
> > >> > is at https://github.com/Microsoft-CISL/reef-spark
> > >> > Please give me your github ID and I will talk to our admins to give
> > you
> > >> > access to that repo.
> > >> >
> > >> > I would love to work with you in any of these areas; please feel
> free
> > to
> > >> > pick any JIRA items that might be interesting to you :)
> > >> >
> > >> >
> > >> > Cheers,
> > >> > Sergiy.
> > >> >
> > >> >
> > >> > On Tue, Apr 4, 2017 at 1:38 PM, Saikat Kanjilal <sxk1969@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Hey Sergei,
> > >> > > I am interested in helping with reef integration with spark and
it
> > >> makes
> > >> > > sense for me to help here as I'm knee deep in spark working on
a
> > >> bigdata
> > >> > > platform at the moment,  thoughts on where I can contribute here?
> > >> > >
> > >> > > Thanks
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message