incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: S4-0.6.0 and Hadoop Yarn
Date Mon, 08 Apr 2013 08:23:50 GMT

On Apr 8, 2013, at 06:17 , JiHyoun Park wrote:

> Dear Matthieu
> 
> What we need for the Yarn integration is just to include 2 hdfs-deploy-related classes,
which were developed at S4-25, in the s4 core-deploy package.
> 
> - org.apache.s4.deploy.HdfsFetcherModule.java
> - org.apache.s4.deploy.HdfsS4RFetcher.java
> 
> And, simple modification at org.apache.s4.core.util.RemoteFileFetcher.java to be able
to identify "hdfs" as one of s4r download sources.
> 
>         if ("hdfs".equalsIgnoreCase(scheme)){
>             return new HdfsArchiveFetcher().fetch(uri);            
>         }


Hi Jihyoun,

adding Yarn/Hadoop dependencies in s4-core is something we want to avoid, so that we don't
force a specific version of Hadoop. 

Instead, for S4 0.6, we could actually inject the fetchers through a custom module. We'd ship
the custom module separately from s4-core, avoiding the dependency coupling issue.

Can you add a ticket for this? Thanks!


> 
> I also would like to ask you one more favour.
> Can we have the "-shutdown" option again at org.apache.s4.tools.Deploy.java to avoid
automatic shutdown of S4 application after deployment?
> I tried to use the "-testMode" option, which seemed to act just like the "-shutdown"
option but my s4 application couldn't recognize the option.

If I understand correctly, you tried to port s4-yarn to 0.6.0? 
Did you add the -testMode option in replacement of -shutdown=false here https://github.com/apache/incubator-s4/blob/S4-25/subprojects/s4-yarn/src/main/java/org/apache/s4/tools/yarn/S4YarnClient.java#L387
?

Note that the S4 app being shut down without this option is actually a side effect of the
deployment/configuration s4 tool on Yarn: we need to prevent system.exit  statements since
we are running in a contained environment.

Also, if you have a working port of S4-25 to S4 0.6, you could submit a patch and we could
integrate it. (if you are still iterating you can also fork the project on github and share
your code of the port there, so we can provide feedback).

Thanks,

Matthieu


> 
> Best Regards
> Jihyoun
> 
> 
> On Thu, Apr 4, 2013 at 5:26 PM, Matthieu Morel <mmorel@apache.org> wrote:
> Hi,
> 
> Note that S4 0.5 was a complete refactoring, therefore its main objective was to provide
a functional implementation. Thus there was room for improvements and the focus of the 0.6
release was on performance and usability.
> 
> Most performance improvements in S4 0.6 come from:
> - adding metrics to identify bottlenecks
> - improving serialization and deserialization
> - minimizing buffer copies (and pressure on the garbage collector)
> - leveraging multithreading and async processing, notably by updating Netty pipelines
> 
> Regards,
> 
> Matthieu 
> 
> 
> 
> 
> On Apr 4, 2013, at 07:01 , Siddharth wrote:
> 
>> Hi - Can the development team highlight the exact solution/fix that made it possible
for 0.6 release to be so fast compared to the earlier release.
>> 
>>  
>> 
>> Thanks in advance,
>> 
>> Siddharth
>> 
>>  
>> 
>> From: Matthieu Morel [mailto:mmorel@apache.org] 
>> Sent: Wednesday, April 03, 2013 3:02 PM
>> To: s4-user@incubator.apache.org
>> Subject: Re: S4-0.6.0 and Hadoop Yarn
>> 
>>  
>> 
>> On Apr 2, 2013, at 19:46 , Jeryl Cook wrote:
>> 
>> 
>> 
>> 
>> "handle 200K+ messages per sec"  ,in one instance? or do you mean clustered?
>> 
>>  
>> 
>> This is for processing small events injected into 1 stream on 1 node. By using more
streams and more nodes the overall throughput can get quite higher. 
>> 
>>  
>> 
>> Note that this is a baseline with a basic PE graph (1 injector and 1 PE prototype)
and performance in practice will be impacted by the complexity of the application and the
nature of the processing, the hardware and allocated resources, the size and complexity of
messages etc..
>> 
>>  
>> 
>> A benchmarking framework is included in the distribution, so you can reproduce the
experiments.
>> 
>>  
>> 
>> Regards,
>> 
>>  
>> 
>> Matthieu 
>> 
>>  
>> 
>>  
>> 
>>>  
>>> 
>>> On Mon, Apr 1, 2013 at 10:42 PM, JiHyoun Park <april3@gmail.com> wrote:
>>> 
>>> Hi
>>> 
>>> I am testing the newest release of S4.
>>> It's fantastic that the stream throughput of S4 0.6.0 has been improved to handle
200K+ messages per sec.!
>>> However, it seems that S4-25 branch - deploying S4 applications with Yarn - is
not included in the 0.6.0 package yet. 
>>> I already built a system to run S4 applications on Yarn and want to migrate its
S4 framework from 0.5.0 to 0.6.0.
>>> How can I use the 'deploying S4 applications with Yarn' feature on S4 0.6.0?
>>> 
>>> Best Regards
>>> Jihyoun
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Jeryl Cook
>>> Founder & Chief Executive Officer
>>> VanitySoft, Inc.
>>> A Geo Business Intelligence Technology Consulting Firm
>>> www.vanity-soft.com
>>> www.linkedin.com/in/jerylcook
>>> Get answers to "who knew what, when, and where"... and everything in between.
>>> 
>>> ____________________________________________________
>>> This message contains information which may be confidential and privileged. Unless
you are the addressee (or authorized to receive for the addressee), you may not use, copy
or disclose to anyone the message or any information contained in the message. If you have
received the message in error, please advise the sender by reply e-mail jeryl.cook@vanity-soft.com,
and delete the message.
>>> 
>> 
>>  
>> 
> 
> 


Mime
View raw message