incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <matth...@yahoo-inc.com>
Subject Re: S4-0.6.0 and Hadoop Yarn
Date Mon, 08 Apr 2013 09:38:52 GMT

On Apr 8, 2013, at 11:31 , JiHyoun Park wrote:

Dear Matthieu

Yes, I am trying to port s4-yarn to 0.6.0.

-testMode option is what 0.6.0 has now in org.apache.s4.tools.Deploy.java,

            // Explicitly shutdown the JVM since Gradle leaves non-daemon threads running
that delay the termination
            if (!deployArgs.testMode) {
                System.exit(0);
            }

just like the -shutdown option in the same class file of S4-25.

            // Explicitly shutdown the JVM since Gradle leaves non-daemon threads running
that delay the termination
            if (deployArgs.shutdown) {
                System.exit(0);
            }

But the difference is
        @Parameter(names = "-testMode", description = "Special mode for regression testing",
hidden = true)
        @Parameter(names = "-shutdown", description = "Shutdown JVM after deployment. Useful
to avoid waiting for remaining long running threads from Gradle", arity = 1)

I tried to pass "-testMode=true" to Deploy.main()

        String [] argDeploy = {"-s4r=" + s4r_path_HDFS,
            "-cluster=" + cluster_name,
            "-appName=" + application_name,
            "-testMode=true"
        };
        Deploy.main(argDeploy);

but got an error.


Cannot parse arguments: class com.beust.jcommander.ParameterException -> Was passed main
parameter 'true' but no main parameter was defined

With JCommander, the CLI parser we use, "-testMode" is a boolean parameter with arity of 0.
So either you specify it : "-testMode" -> takes "true" value, or you don't (takes default
"false" value).

The error you are reporting might come from that.

Hope this helps,

Matthieu



Usage

Usage: <main class> [options]

  Options:
    -a, -appClass                Full class name of the application class
                                 (extending App or AdapterApp)
  * -appName                     Name of S4 application.

  * -c, -cluster                 Logical name of the S4 cluster
    -debug                       Display debug information from the build system
                                 Default: false
    -gradleOpts                  gradle system properties (as in GRADLE_OPTS

                                 environment properties) passed to gradle scripts
                                 Default: []
    -help                        usage
                                 Default: false

    -modulesClasses, -emc, -mc   Fully qualified class names of custom modules
                                 Default: []
    -modulesURIs, -mu            URIs for fetching code of custom modules
                                 Default: []

    -namedStringParameters, -p   Comma-separated list of inline configuration
                                 parameters. Syntax: '-p=name1=value1,name2=value2 '
                                 Default: []

    -s4r                         URI to existing s4r file
    -timeout                     Connection timeout to Zookeeper, in ms
                                 Default: 10000
    -zk                          ZooKeeper connection string

                                 Default: localhost:2181



Best Regards
Jihyoun


On Mon, Apr 8, 2013 at 4:23 PM, Matthieu Morel <mmorel@apache.org<mailto:mmorel@apache.org>>
wrote:

On Apr 8, 2013, at 06:17 , JiHyoun Park wrote:

Dear Matthieu

What we need for the Yarn integration is just to include 2 hdfs-deploy-related classes, which
were developed at S4-25, in the s4 core-deploy package.

- org.apache.s4.deploy.HdfsFetcherModule.java
- org.apache.s4.deploy.HdfsS4RFetcher.java

And, simple modification at org.apache.s4.core.util.RemoteFileFetcher.java to be able to identify
"hdfs" as one of s4r download sources.

        if ("hdfs".equalsIgnoreCase(scheme)){
            return new HdfsArchiveFetcher().fetch(uri);
        }


Hi Jihyoun,

adding Yarn/Hadoop dependencies in s4-core is something we want to avoid, so that we don't
force a specific version of Hadoop.

Instead, for S4 0.6, we could actually inject the fetchers through a custom module. We'd ship
the custom module separately from s4-core, avoiding the dependency coupling issue.

Can you add a ticket for this? Thanks!



I also would like to ask you one more favour.
Can we have the "-shutdown" option again at org.apache.s4.tools.Deploy.java to avoid automatic
shutdown of S4 application after deployment?
I tried to use the "-testMode" option, which seemed to act just like the "-shutdown" option
but my s4 application couldn't recognize the option.

If I understand correctly, you tried to port s4-yarn to 0.6.0?
Did you add the -testMode option in replacement of -shutdown=false here https://github.com/apache/incubator-s4/blob/S4-25/subprojects/s4-yarn/src/main/java/org/apache/s4/tools/yarn/S4YarnClient.java#L387
?

Note that the S4 app being shut down without this option is actually a side effect of the
deployment/configuration s4 tool on Yarn: we need to prevent system.exit  statements since
we are running in a contained environment.

Also, if you have a working port of S4-25 to S4 0.6, you could submit a patch and we could
integrate it. (if you are still iterating you can also fork the project on github and share
your code of the port there, so we can provide feedback).

Thanks,

Matthieu



Best Regards
Jihyoun


On Thu, Apr 4, 2013 at 5:26 PM, Matthieu Morel <mmorel@apache.org<mailto:mmorel@apache.org>>
wrote:
Hi,

Note that S4 0.5 was a complete refactoring, therefore its main objective was to provide a
functional implementation. Thus there was room for improvements and the focus of the 0.6 release
was on performance and usability.

Most performance improvements in S4 0.6 come from:
- adding metrics to identify bottlenecks
- improving serialization and deserialization
- minimizing buffer copies (and pressure on the garbage collector)
- leveraging multithreading and async processing, notably by updating Netty pipelines

Regards,

Matthieu




On Apr 4, 2013, at 07:01 , Siddharth wrote:

Hi - Can the development team highlight the exact solution/fix that made it possible for 0.6
release to be so fast compared to the earlier release.

Thanks in advance,
Siddharth

________________________________
From: Matthieu Morel [mailto:mmorel@apache.org<mailto:mmorel@apache.org>]
Sent: Wednesday, April 03, 2013 3:02 PM
To: s4-user@incubator.apache.org<mailto:s4-user@incubator.apache.org>
Subject: Re: S4-0.6.0 and Hadoop Yarn

On Apr 2, 2013, at 19:46 , Jeryl Cook wrote:


"handle 200K+ messages per sec"  ,in one instance? or do you mean clustered?

This is for processing small events injected into 1 stream on 1 node. By using more streams
and more nodes the overall throughput can get quite higher.

Note that this is a baseline with a basic PE graph (1 injector and 1 PE prototype) and performance
in practice will be impacted by the complexity of the application and the nature of the processing,
the hardware and allocated resources, the size and complexity of messages etc..

A benchmarking framework is included in the distribution, so you can reproduce the experiments.

Regards,

Matthieu



On Mon, Apr 1, 2013 at 10:42 PM, JiHyoun Park <april3@gmail.com<mailto:april3@gmail.com>>
wrote:
Hi

I am testing the newest release of S4.
It's fantastic that the stream throughput of S4 0.6.0 has been improved to handle 200K+ messages
per sec.!
However, it seems that S4-25 branch - deploying S4 applications with Yarn - is not included
in the 0.6.0 package yet.
I already built a system to run S4 applications on Yarn and want to migrate its S4 framework
from 0.5.0 to 0.6.0.
How can I use the 'deploying S4 applications with Yarn' feature on S4 0.6.0?

Best Regards
Jihyoun



--
Jeryl Cook
Founder & Chief Executive Officer
VanitySoft, Inc.
A Geo Business Intelligence Technology Consulting Firm
www.vanity-soft.com<http://www.vanity-soft.com/>
www.linkedin.com/in/jerylcook<http://www.linkedin.com/in/jerylcook>
Get answers to "who knew what, when, and where"... and everything in between.

____________________________________________________
This message contains information which may be confidential and privileged. Unless you are
the addressee (or authorized to receive for the addressee), you may not use, copy or disclose
to anyone the message or any information contained in the message. If you have received the
message in error, please advise the sender by reply e-mail jeryl.cook@vanity-soft.com<mailto:jeryl.cook@vanity-soft.com>,
and delete the message.







Mime
View raw message