reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergiy Matusevych <sergiy.matusev...@gmail.com>
Subject Re: reef integration with spark
Date Fri, 28 Apr 2017 02:09:03 GMT
Hi Saikat,

It was great to meet you today. I've opened a JIRA issue regarding
reef-runtime-spark, and assigned it to you:
https://issues.apache.org/jira/browse/REEF-1791

You are also welcome to look at other issues and PRs; of two problems I've
mentioned earlier in this thread, we already have a fix for the latter one;
REEF-1782 is still open, but it should not block you from your other work.

Thank you,
Sergiy.

On Thu, Apr 20, 2017 at 10:27 PM, Sergiy Matusevych <
sergiy.matusevych@gmail.com> wrote:

> Hi Saikat,
>
> This is great! REEF-on-REEF is actually the major step towards
> REEF-on-Spark, and it would be great to make it work ASAP. In general,
> pretty much all work for Spark integration is on REEF side; I really hope
> we won't have to change Spark for our needs. One of our current blockers is
> this issue: https://issues.apache.org/jira/browse/REEF-1782 - it would be
> fantastic if you could help me with it.
>
> Another problem you can look at is at the reef-spark project: when I run
> the app on YARN, the evaluator fails to find REEFLauncher class at the
> start of the evaluator. Most likely, we screw up the classpath or the
> resources, but I don't know where. One suspicious thing is the following.
> REEF packages its jars for the evaluator into hdfs://user/hadoop/global.jar
> resource. this is fine; however, inside that global jar we use the full
> path to the packaged jars, e.g.
>
> $ jar tvf global.jar
>  15082 Wed Apr 19 19:03:50 PDT 2017 mnt/data/0/local/nm/usercache/
> hadoop/filecache/18/reef-driver-on-spark_2.11-1.01.jar
> 16963704 Wed Apr 19 19:03:50 PDT 2017 mnt/data/0/local/nm/usercache/
> hadoop/filecache/15/reef-examples-0.16.0-SNAPSHOT-shaded.jar
>
> -- note that mnt/data/0/... path. I am not sure if it OK to have such a
> long path (that is also specific to the driver host).
>
>
> That'd be great if you could investigate either of these two issues.
>
> Thanks a lot and please don't hesitate to ping me if you have any
> questions! I'd be also happy to meet you in person to buy you a coffee and
> give you a guided tour over REEF codebase.
>
> Cheers,
> Sergiy.
>
>
>
>
> On Thu, Apr 20, 2017 at 1:27 PM, Saikat Kanjilal <sxk1969@gmail.com>
> wrote:
>
>> Sergiy,
>> I am finally able to come up for air as I've finished quite a bit of my
>> spark transform work and start working on this, before I do, should I
>> focus
>> on reef on reef or reef on spark, where do you need the most help?
>> Thanks
>>
>> On Tue, Apr 11, 2017 at 1:49 PM, Sergiy Matusevych <
>> sergiy.matusevych@gmail.com> wrote:
>>
>> > Hi Sakiat,
>> >
>> > How's everything? Were you able to run Spark+REEF app on YARN? I've just
>> > added a call to ls -l on the Driver to see where REEF jars are. The log
>> > looks like this:
>> >
>> > 17/04/11 15:55:35 INFO cisl.ReefOnSpark$: Job submitted:
>> > application_1491940426172_0002
>> > 17/04/11 15:55:35 INFO util.REEFVersion: REEF Version: 0.16.0-SNAPSHOT
>> > 17/04/11 15:55:36 ERROR yarn.YarnClasspathProvider:
>> > YarnConfiguration.YARN_APPLICATION_CLASSPATH is empty. This indicates a
>> > broken cluster configuration.
>> > 17/04/11 15:55:36 INFO cisl.ReefOnSpark$: ls:
>> > .:
>> > total 68
>> > drwx--x---. 4 hadoop hadoop 4096 Apr 11 15:55 ./
>> > drwx--x---. 6 hadoop hadoop 4096 Apr 11 15:55 ../
>> > lrwxrwxrwx. 1 hadoop hadoop   85 Apr 11 15:55 __app__.jar ->
>> > /mnt/data/0/local/nm/usercache/hadoop/filecache/13/
>> > reef-driver-on-spark_2.11-1.01.jar
>> > -rw-r--r--. 1 hadoop hadoop   69 Apr 11 15:55 container_tokens
>> > -rw-r--r--. 1 hadoop hadoop   12 Apr 11 15:55 .container_tokens.crc
>> > -rwx------. 1 hadoop hadoop  635 Apr 11 15:55
>> > default_container_executor_session.sh
>> > -rw-r--r--. 1 hadoop hadoop   16 Apr 11 15:55
>> > .default_container_executor_session.sh.crc
>> > -rwx------. 1 hadoop hadoop  689 Apr 11 15:55
>> default_container_executor.sh
>> > -rw-r--r--. 1 hadoop hadoop   16 Apr 11 15:55
>> > .default_container_executor.sh.crc
>> > -rwx------. 1 hadoop hadoop 3728 Apr 11 15:55 launch_container.sh
>> > -rw-r--r--. 1 hadoop hadoop   40 Apr 11 15:55 .launch_container.sh.crc
>> > drwxrwxr-x. 4 hadoop hadoop 4096 Apr 11 15:55 reef/
>> > lrwxrwxrwx. 1 hadoop hadoop   91 Apr 11 15:55
>> > reef-examples-0.16.0-SNAPSHOT-shaded.jar ->
>> > /mnt/data/0/local/nm/usercache/hadoop/filecache/10/
>> > reef-examples-0.16.0-SNAPSHOT-shaded.jar
>> > lrwxrwxrwx. 1 hadoop hadoop   73 Apr 11 15:55 scala-arm_2.11-1.4.jar ->
>> > /mnt/data/0/local/nm/usercache/hadoop/filecache/14/scala-
>> arm_2.11-1.4.jar
>> > lrwxrwxrwx. 1 hadoop hadoop   69 Apr 11 15:55 __spark_conf__ ->
>> > /mnt/data/0/local/nm/usercache/hadoop/filecache/12/__spark_conf__.zip
>> > lrwxrwxrwx. 1 hadoop hadoop   88 Apr 11 15:55 __spark_libs__ ->
>> > /mnt/data/0/local/nm/usercache/hadoop/filecache/11/__spark_libs__
>> > 2905924713127025597.zip
>> > drwx--x---. 2 hadoop hadoop 4096 Apr 11 15:55 tmp/
>> >
>> > ./reef:
>> > total 16
>> > drwxrwxr-x. 4 hadoop hadoop 4096 Apr 11 15:55 ./
>> > drwx--x---. 4 hadoop hadoop 4096 Apr 11 15:55 ../
>> > drwxrwxr-x. 2 hadoop hadoop 4096 Apr 11 15:55 global/
>> > drwxrwxr-x. 2 hadoop hadoop 4096 Apr 11 15:55 temp/
>> >
>> > ./reef/global:
>> > total 12
>> > drwxrwxr-x. 2 hadoop hadoop 4096 Apr 11 15:55 ./
>> > drwxrwxr-x. 4 hadoop hadoop 4096 Apr 11 15:55 ../
>> > lrwxrwxrwx. 1 hadoop hadoop  128 Apr 11 15:55 __app__.jar ->
>> > /mnt/data/0/local/nm/usercache/hadoop/appcache/application_
>> 1491940426172_
>> > 0001/container_1491940426172_0001_01_000001/__app__.jar
>> >
>> > ./reef/temp:
>> > total 8
>> > drwxrwxr-x. 2 hadoop hadoop 4096 Apr 11 15:55 ./
>> > drwxrwxr-x. 4 hadoop hadoop 4096 Apr 11 15:55 ../
>> >
>> > ./tmp:
>> > total 8
>> > drwx--x---. 2 hadoop hadoop 4096 Apr 11 15:55 ./
>> > drwx--x---. 4 hadoop hadoop 4096 Apr 11 15:55 ../
>> >
>> >
>> > Maybe that can help you to debug the issue? I am looking at it too now.
>> > Please feel free to ping me if you have questions!
>> >
>> > Cheers,
>> > Sergiy.
>> >
>> >
>> > On Fri, Apr 7, 2017 at 2:05 PM, Sergiy Matusevych <
>> > sergiy.matusevych@gmail.com> wrote:
>> >
>> > > Hi Saikat,
>> > >
>> > > I've sent you an invite to reef-spark repo. I gave you write rights to
>> > it,
>> > > so you are welcome to create branches (prefixed with your id_) and
>> issue
>> > > pull requests into master - just make sure you don't clone it into the
>> > > public.
>> > >
>> > > To build it, first do mvn install on REEF branch
>> https://github.com/motu
>> > > s/reef/tree/sergiym_uam_debug, and then run sbt package in reef-spark
>> > > master.
>> > > Note that you need YARN 2.7.3 or newer to make Unmanaged AM work.
>> > >
>> > > For starters, you can try to figure our where Spark copies its jars
>> to on
>> > > the AM host - that's the reason reef-spark does not work now. That is,
>> > when
>> > > I run
>> > >
>> > > ../spark/bin/spark-submit --master yarn --deploy-mode cluster --class
>> > > com.microsoft.cisl.ReefOnSpark --jars ../reef/lang/java/reef-example
>> > > s/target/reef-examples-0.16.0-SNAPSHOT-shaded.jar,scala-arm_
>> 2.11-1.4.jar
>> > > target/scala-2.11/reef-driver-on-spark_2.11-1.01.jar
>> > >
>> > > REEF (unmanaged) driver fails to pick the right JAR file to copy to
>> the
>> > > evaluators. That'd be great if you could fix that problem.
>> > >
>> > > Meanwhile, I'll keep working on REEF-on-REEF and other things that
>> > require
>> > > deeper dive into REEF internals.
>> > >
>> > > Cheers,
>> > > Sergiy.
>> > >
>> > > On Fri, Apr 7, 2017 at 12:49 PM, Saikat Kanjilal <sxk1969@gmail.com>
>> > > wrote:
>> > >
>> > > I would love to work on all three but number three is most
>> interesting to
>> > >> me as I'm knee deep into spark and writing a scala-spark-command line
>> > app
>> > >> to build an ml repo in azure.  Let me know your thoughts on next
>> steps,
>> > my
>> > >> github id is skanjila
>> > >>
>> > >> On Fri, Apr 7, 2017 at 12:40 PM, Sergiy Matusevych <
>> > >> sergiy.matusevych@gmail.com> wrote:
>> > >>
>> > >> > Hi Saikat,
>> > >> >
>> > >> > Sorry I did not see your message earlier. Of course, you are most
>> > >> welcome
>> > >> > to participate! Any help will be deeply appreciated.
>> > >> >
>> > >> > There are currently three lines of work:
>> > >> >
>> > >> > 1) REEF as a library, https://issues.apache.org/jira
>> /browse/REEF-1561
>> > >> > When running on top of Spark, REEF application has co clean up
its
>> > >> > resources on exit before handing control back to Spark. We still
>> have
>> > >> some
>> > >> > rogue threads not closed at the end of the REEF+YARN job, and
we
>> need
>> > to
>> > >> > fix that. Here's more detailed description of the problem:
>> > >> > https://issues.apache.org/jira/browse/REEF-1729?focusedComme
>> > >> ntId=15902195
>> > >> >
>> > >> > 2) REEF on REEF, https://issues.apache.org/jira/browse/REEF-1667
>> > >> > We want to run a REEF job on YARN and then launch another REEF
>> driver
>> > as
>> > >> > Unmanaged AM. That is, we want to run *two* REEF drivers in one
>> YARN
>> > >> > container. That example is in master and works for local runtime,
>> but
>> > >> not
>> > >> > on YARN. I am currently working on it in a separate branch, at
>> > >> > https://github.com/motus/reef/tree/sergiym_uam_debug
>> > >> >
>> > >> > 3) REEF on Spark is almost identical to REEF on REEF, except the
>> first
>> > >> > (host) driver is Spark and the second one (unmanaged AM) is REEF.
>> That
>> > >> code
>> > >> > is at https://github.com/Microsoft-CISL/reef-spark
>> > >> > Please give me your github ID and I will talk to our admins to
give
>> > you
>> > >> > access to that repo.
>> > >> >
>> > >> > I would love to work with you in any of these areas; please feel
>> free
>> > to
>> > >> > pick any JIRA items that might be interesting to you :)
>> > >> >
>> > >> >
>> > >> > Cheers,
>> > >> > Sergiy.
>> > >> >
>> > >> >
>> > >> > On Tue, Apr 4, 2017 at 1:38 PM, Saikat Kanjilal <sxk1969@gmail.com
>> >
>> > >> wrote:
>> > >> >
>> > >> > > Hey Sergei,
>> > >> > > I am interested in helping with reef integration with spark
and
>> it
>> > >> makes
>> > >> > > sense for me to help here as I'm knee deep in spark working
on a
>> > >> bigdata
>> > >> > > platform at the moment,  thoughts on where I can contribute
here?
>> > >> > >
>> > >> > > Thanks
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message