reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergiy Matusevych <sergiy.matusev...@gmail.com>
Subject Re: reef integration with spark
Date Tue, 11 Apr 2017 20:49:58 GMT
Hi Sakiat,

How's everything? Were you able to run Spark+REEF app on YARN? I've just
added a call to ls -l on the Driver to see where REEF jars are. The log
looks like this:

17/04/11 15:55:35 INFO cisl.ReefOnSpark$: Job submitted:
application_1491940426172_0002
17/04/11 15:55:35 INFO util.REEFVersion: REEF Version: 0.16.0-SNAPSHOT
17/04/11 15:55:36 ERROR yarn.YarnClasspathProvider:
YarnConfiguration.YARN_APPLICATION_CLASSPATH is empty. This indicates a
broken cluster configuration.
17/04/11 15:55:36 INFO cisl.ReefOnSpark$: ls:
.:
total 68
drwx--x---. 4 hadoop hadoop 4096 Apr 11 15:55 ./
drwx--x---. 6 hadoop hadoop 4096 Apr 11 15:55 ../
lrwxrwxrwx. 1 hadoop hadoop   85 Apr 11 15:55 __app__.jar ->
/mnt/data/0/local/nm/usercache/hadoop/filecache/13/reef-driver-on-spark_2.11-1.01.jar
-rw-r--r--. 1 hadoop hadoop   69 Apr 11 15:55 container_tokens
-rw-r--r--. 1 hadoop hadoop   12 Apr 11 15:55 .container_tokens.crc
-rwx------. 1 hadoop hadoop  635 Apr 11 15:55
default_container_executor_session.sh
-rw-r--r--. 1 hadoop hadoop   16 Apr 11 15:55
.default_container_executor_session.sh.crc
-rwx------. 1 hadoop hadoop  689 Apr 11 15:55 default_container_executor.sh
-rw-r--r--. 1 hadoop hadoop   16 Apr 11 15:55
.default_container_executor.sh.crc
-rwx------. 1 hadoop hadoop 3728 Apr 11 15:55 launch_container.sh
-rw-r--r--. 1 hadoop hadoop   40 Apr 11 15:55 .launch_container.sh.crc
drwxrwxr-x. 4 hadoop hadoop 4096 Apr 11 15:55 reef/
lrwxrwxrwx. 1 hadoop hadoop   91 Apr 11 15:55
reef-examples-0.16.0-SNAPSHOT-shaded.jar ->
/mnt/data/0/local/nm/usercache/hadoop/filecache/10/reef-examples-0.16.0-SNAPSHOT-shaded.jar
lrwxrwxrwx. 1 hadoop hadoop   73 Apr 11 15:55 scala-arm_2.11-1.4.jar ->
/mnt/data/0/local/nm/usercache/hadoop/filecache/14/scala-arm_2.11-1.4.jar
lrwxrwxrwx. 1 hadoop hadoop   69 Apr 11 15:55 __spark_conf__ ->
/mnt/data/0/local/nm/usercache/hadoop/filecache/12/__spark_conf__.zip
lrwxrwxrwx. 1 hadoop hadoop   88 Apr 11 15:55 __spark_libs__ ->
/mnt/data/0/local/nm/usercache/hadoop/filecache/11/__spark_libs__2905924713127025597.zip
drwx--x---. 2 hadoop hadoop 4096 Apr 11 15:55 tmp/

./reef:
total 16
drwxrwxr-x. 4 hadoop hadoop 4096 Apr 11 15:55 ./
drwx--x---. 4 hadoop hadoop 4096 Apr 11 15:55 ../
drwxrwxr-x. 2 hadoop hadoop 4096 Apr 11 15:55 global/
drwxrwxr-x. 2 hadoop hadoop 4096 Apr 11 15:55 temp/

./reef/global:
total 12
drwxrwxr-x. 2 hadoop hadoop 4096 Apr 11 15:55 ./
drwxrwxr-x. 4 hadoop hadoop 4096 Apr 11 15:55 ../
lrwxrwxrwx. 1 hadoop hadoop  128 Apr 11 15:55 __app__.jar ->
/mnt/data/0/local/nm/usercache/hadoop/appcache/application_1491940426172_0001/container_1491940426172_0001_01_000001/__app__.jar

./reef/temp:
total 8
drwxrwxr-x. 2 hadoop hadoop 4096 Apr 11 15:55 ./
drwxrwxr-x. 4 hadoop hadoop 4096 Apr 11 15:55 ../

./tmp:
total 8
drwx--x---. 2 hadoop hadoop 4096 Apr 11 15:55 ./
drwx--x---. 4 hadoop hadoop 4096 Apr 11 15:55 ../


Maybe that can help you to debug the issue? I am looking at it too now.
Please feel free to ping me if you have questions!

Cheers,
Sergiy.


On Fri, Apr 7, 2017 at 2:05 PM, Sergiy Matusevych <
sergiy.matusevych@gmail.com> wrote:

> Hi Saikat,
>
> I've sent you an invite to reef-spark repo. I gave you write rights to it,
> so you are welcome to create branches (prefixed with your id_) and issue
> pull requests into master - just make sure you don't clone it into the
> public.
>
> To build it, first do mvn install on REEF branch https://github.com/motu
> s/reef/tree/sergiym_uam_debug, and then run sbt package in reef-spark
> master.
> Note that you need YARN 2.7.3 or newer to make Unmanaged AM work.
>
> For starters, you can try to figure our where Spark copies its jars to on
> the AM host - that's the reason reef-spark does not work now. That is, when
> I run
>
> ../spark/bin/spark-submit --master yarn --deploy-mode cluster --class
> com.microsoft.cisl.ReefOnSpark --jars ../reef/lang/java/reef-example
> s/target/reef-examples-0.16.0-SNAPSHOT-shaded.jar,scala-arm_2.11-1.4.jar
> target/scala-2.11/reef-driver-on-spark_2.11-1.01.jar
>
> REEF (unmanaged) driver fails to pick the right JAR file to copy to the
> evaluators. That'd be great if you could fix that problem.
>
> Meanwhile, I'll keep working on REEF-on-REEF and other things that require
> deeper dive into REEF internals.
>
> Cheers,
> Sergiy.
>
> On Fri, Apr 7, 2017 at 12:49 PM, Saikat Kanjilal <sxk1969@gmail.com>
> wrote:
>
> I would love to work on all three but number three is most interesting to
>> me as I'm knee deep into spark and writing a scala-spark-command line app
>> to build an ml repo in azure.  Let me know your thoughts on next steps, my
>> github id is skanjila
>>
>> On Fri, Apr 7, 2017 at 12:40 PM, Sergiy Matusevych <
>> sergiy.matusevych@gmail.com> wrote:
>>
>> > Hi Saikat,
>> >
>> > Sorry I did not see your message earlier. Of course, you are most
>> welcome
>> > to participate! Any help will be deeply appreciated.
>> >
>> > There are currently three lines of work:
>> >
>> > 1) REEF as a library, https://issues.apache.org/jira/browse/REEF-1561
>> > When running on top of Spark, REEF application has co clean up its
>> > resources on exit before handing control back to Spark. We still have
>> some
>> > rogue threads not closed at the end of the REEF+YARN job, and we need to
>> > fix that. Here's more detailed description of the problem:
>> > https://issues.apache.org/jira/browse/REEF-1729?focusedComme
>> ntId=15902195
>> >
>> > 2) REEF on REEF, https://issues.apache.org/jira/browse/REEF-1667
>> > We want to run a REEF job on YARN and then launch another REEF driver as
>> > Unmanaged AM. That is, we want to run *two* REEF drivers in one YARN
>> > container. That example is in master and works for local runtime, but
>> not
>> > on YARN. I am currently working on it in a separate branch, at
>> > https://github.com/motus/reef/tree/sergiym_uam_debug
>> >
>> > 3) REEF on Spark is almost identical to REEF on REEF, except the first
>> > (host) driver is Spark and the second one (unmanaged AM) is REEF. That
>> code
>> > is at https://github.com/Microsoft-CISL/reef-spark
>> > Please give me your github ID and I will talk to our admins to give you
>> > access to that repo.
>> >
>> > I would love to work with you in any of these areas; please feel free to
>> > pick any JIRA items that might be interesting to you :)
>> >
>> >
>> > Cheers,
>> > Sergiy.
>> >
>> >
>> > On Tue, Apr 4, 2017 at 1:38 PM, Saikat Kanjilal <sxk1969@gmail.com>
>> wrote:
>> >
>> > > Hey Sergei,
>> > > I am interested in helping with reef integration with spark and it
>> makes
>> > > sense for me to help here as I'm knee deep in spark working on a
>> bigdata
>> > > platform at the moment,  thoughts on where I can contribute here?
>> > >
>> > > Thanks
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message