Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C38318374 for ; Thu, 16 Jul 2015 08:30:01 +0000 (UTC) Received: (qmail 65105 invoked by uid 500); 16 Jul 2015 08:30:01 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 65029 invoked by uid 500); 16 Jul 2015 08:30:01 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 65018 invoked by uid 99); 16 Jul 2015 08:30:01 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jul 2015 08:30:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id E5AC5C0098 for ; Thu, 16 Jul 2015 08:30:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.902 X-Spam-Level: ** X-Spam-Status: No, score=2.902 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, LOTS_OF_MONEY=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id WR8XFqTbw9xk for ; Thu, 16 Jul 2015 08:29:48 +0000 (UTC) Received: from mail-ig0-f175.google.com (mail-ig0-f175.google.com [209.85.213.175]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id EFFE722F2E for ; Thu, 16 Jul 2015 08:29:47 +0000 (UTC) Received: by igbij6 with SMTP id ij6so8444394igb.1 for ; Thu, 16 Jul 2015 01:29:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=I48+wkwxW9vDKviWTcmnMoprtzY5DKEeMu2/gHdntbc=; b=JQNUFNdm77X1vTAdlpLsGZKhxMcjeyGmZMVnbaR7ALpF0AgGNvg0ZMd+J1iRl6jAL0 sxVyPPInD2Dil6OxuaFT1hF6LJuQbsyBll1UGuR8rAnDVMnpv4TKz95Mp8rNiNDm814i EtSCvtr0FBbO5STXIPd+mSdIhHBNiNW0l9C+3GvJvu/PxG8XPQfQhet8BOMLDNPwSTJp 8pmmCYt7yDDfby1LW+mqmEsDSVlAqolOgVUhMK/sUrU4SoUPYt8EVI80MrxQL1N/VTpe xb7E9SidwvnwNIL2E0sCfsW1F20a3GsGrtoo41BKiPPeCQLTPHf7W+kupbBikroo9INo X4HA== MIME-Version: 1.0 X-Received: by 10.50.138.193 with SMTP id qs1mr1303033igb.2.1437035387483; Thu, 16 Jul 2015 01:29:47 -0700 (PDT) Received: by 10.107.12.207 with HTTP; Thu, 16 Jul 2015 01:29:47 -0700 (PDT) In-Reply-To: References: <55a356e0.67c0440a.b165b.ffffd9d5@mx.google.com> Date: Thu, 16 Jul 2015 01:29:47 -0700 Message-ID: Subject: Re: Sort Benchmark infrastructure From: Hawin Jiang To: George Porter Cc: Mike Conley , Stephan Ewen , user@flink.apache.org Content-Type: multipart/alternative; boundary=089e0122a5b8ccffd3051af9dfb6 --089e0122a5b8ccffd3051af9dfb6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi George Thanks for the details. It looks like I have a long way to go. For big data benchmark, I would like to use that test cases, test data and test methodology to test different big data technologies. BTW, I am agree with you that no one system will necessarily be optimal for all cases for all workloads. I hope I can find a good one for our enterprise application. I will let you know if I can move forward this. Good Night. Best regards Hawin On Wed, Jul 15, 2015 at 9:30 AM, George Porter wrote= : > Hi Hawin, > > We used varying numbers of the i2.8xlarge servers, depending on the sort > record category. http://sortbenchmark.org/ is really your best source > for what we did--all the details (should) be on our write-ups. Note that > we pro-rated the cost, meaning that if we ran for 15 minutes, we took the > hourly rate and divided by 4. > > In terms of sponsorship, we used a combination of credits donated by > Amazon, as well as funding form the National Science Foundation. You can > submit a grant proposal to Amazon and ask them for credits if you're an > academic or researcher. Not sure if being part of an open-source project > counts, but you might as well try. > > In terms of the sort record, that webpage I provided above has all the > details on the challenge. Not sure about Big Data benchmark--that term i= s > pretty vague. Often when people say big data, they mean different things= . > Our system is designed for lots of bytes, but not really lots of compute > over those bytes. Others pick different design points. I think you'll > find that the needs of different users varies quite a bit, and no one > system will necessarily be optimal for all cases for all workloads. > > Good luck on your attempts. > -George > > ---- > George Porter > Assistant Professor, Dept. of Computer Science and Engineering > Associate Director, UCSD Center for Networked Systems > UC San Diego, La Jolla CA > http://www.cs.ucsd.edu/~gmporter/ > > > > On Wed, Jul 15, 2015 at 1:44 AM, Hawin Jiang > wrote: > >> Hi George and Mike >> >> Thanks for your information. Did you use 186 i2.8xlarge servers for >> testing? >> Total one hour cost =3D 186 * 6.82 =3D $1,268.52. >> Do you know any person or company can sponsor this? >> >> For our test approach, I have checked an industry standard from big data >> bench(http://prof.ict.ac.cn/BigDataBench/industry-standard-benchmarks/) >> Maybe we can test TeraSort to see the performance is better than your >> record or not. >> >> Please let me know if you have any comments. >> Thanks for the support. >> >> >> >> >> Best regards >> Hawin >> >> >> >> On Tue, Jul 14, 2015 at 9:42 AM, Mike Conley wrote= : >> >>> George is correct. We used i2.8xlarge with placement groups on Amazon >>> EC2. We ran Amazon Linux, which if I recall correctly is based on Red = Hat, >>> but optimized for EC2. OS was essentially unmodified with some package= s >>> installed for our dependencies. >>> >>> Thanks, >>> Mike >>> >>> On Tue, Jul 14, 2015 at 9:15 AM, George Porter >>> wrote: >>> >>>> Hello Hawin, >>>> >>>> Thanks for reaching out. We wrote a paper on our efforts, which we'll >>>> be posting to our website in a couple of weeks. >>>> >>>> However in summary, we used a cluster of i2.8xlarge instance types fro= m >>>> Amazon, and we made use of the placement group feature to ensure that = we'd >>>> get good bandwidth between them. Mike can correct me if I'm wrong, bu= t I >>>> believe we used the stock AWS version of Linux (Ubuntu maybe?) >>>> >>>> So our environment was pretty stock--we didn't get any special support >>>> or features from AWS. >>>> >>>> Best of luck with your profiling and benchmarking. Do let us know how >>>> you perform. Flink looks like a pretty interesting project, and so le= t us >>>> know if we can help y'all out in some way. >>>> >>>> Thanks, George >>>> >>>> >>>> On Sun, Jul 12, 2015 at 11:12 PM, Hawin Jiang >>>> wrote: >>>> >>>>> Hi Michael and George >>>>> >>>>> >>>>> >>>>> First of all, congratulation you guys have won the sort game again. >>>>> We are coming from Flink community. >>>>> >>>>> I am not sure if it is possible to get your test environment to test >>>>> our Flink for free. we saw that Apache spark did a good job as well. >>>>> >>>>> We want to challenge your records. But we don=E2=80=99t have that muc= h servers >>>>> for testing. >>>>> >>>>> Please let me know if you can help us or not. >>>>> >>>>> Thank you very much. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Best regards >>>>> >>>>> Hawin >>>>> >>>> >>>> >>> >> > --089e0122a5b8ccffd3051af9dfb6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi =C2=A0Geor= ge

Thanks for the details.=C2=A0 It looks li= ke I have a long way to go.=C2=A0
For big data benchmark, I would like to use that test cases, test d= ata and test methodology to test different big data technologies.=C2=A0
BTW, I am agree with you that= no one system will necessarily be optimal for all cases for all workloads.=
I hope I can find a good = one for our enterprise application.=C2=A0 I will let you know if I can move= forward this.
Good Night.=



Best regards<= /div>
Hawin

On Wed, Jul 15, 2015 at 9:= 30 AM, George Porter <gmporter@cs.ucsd.edu> wrote:
Hi Hawin,

W= e used varying numbers of the i2.8xlarge servers, depending on the sort rec= ord category. =C2=A0http://sortbenchmark.org/ is really your best source for what we did--= all the details (should) be on our write-ups.=C2=A0 Note that we pro-rated = the cost, meaning that if we ran for 15 minutes, we took the hourly rate an= d divided by 4.

In terms of sponsorship, we used a= combination of credits donated by Amazon, as well as funding form the Nati= onal Science Foundation.=C2=A0 You can submit a grant proposal to Amazon an= d ask them for credits if you're an academic or researcher.=C2=A0 Not s= ure if being part of an open-source project counts, but you might as well t= ry.

In terms of the sort record, that webpage I pr= ovided above has all the details on the challenge.=C2=A0 Not sure about Big= Data benchmark--that term is pretty vague.=C2=A0 Often when people say big= data, they mean different things.=C2=A0 Our system is designed for lots of= bytes, but not really lots of compute over those bytes.=C2=A0 Others pick = different design points.=C2=A0 I think you'll find that the needs of di= fferent users varies quite a bit, and no one system will necessarily be opt= imal for all cases for all workloads.

Good luck on= your attempts. =C2=A0
-George

----
George Porter
Assistant Professor, Dept. of Com= puter Science and Engineering
Associate Director, UCSD Center for= Networked Systems
UC San Diego, La Jolla CA
http://www.cs.ucsd.edu/~gmport= er/



On Wed, Jul 15, 2015 at 1:44 AM, Haw= in Jiang <hawin.jiang@gmail.com> wrote:
Hi =C2=A0George and Mike

<= div>Thanks for your information.=C2=A0 Did you use 186 i2.8xlarge servers f= or testing? =C2=A0
Total one hour cost =3D 186 * 6.82 =3D $1,268.= 52.
Do you know any person or company can sponsor this?

For our test approach, I have checked an industry standard = from big data bench(http://prof.ict.ac.cn/BigDataBench/= industry-standard-benchmarks/)
Maybe we can test TeraSort to = see the performance is better than your record or not.=C2=A0

=
Please let me know if you have any comments.
Thanks fo= r the support.=C2=A0




Best regards
Hawin= =C2=A0



On Tue, Jul 14, 2015= at 9:42 AM, Mike Conley <mconley@cs.ucsd.edu> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
George is correct.=C2=A0 We = used i2.8xlarge with placement groups on Amazon EC2.=C2=A0 We ran Amazon Li= nux, which if I recall correctly is based on Red Hat, but optimized for EC2= .=C2=A0 OS was essentially unmodified with some packages installed for our = dependencies.

Thanks,
Mike

On Tue, Jul 14,= 2015 at 9:15 AM, George Porter <gmporter@cs.ucsd.edu> wr= ote:
Hello Hawin,
Thanks for reaching out.=C2=A0 We wrote a paper on our efforts= , which we'll be posting to our website in a couple of weeks.

However in summary, we used a cluster of i2.8xlarge instance types fro= m Amazon, and we made use of the placement group feature to ensure that we&= #39;d get good bandwidth between them.=C2=A0 Mike can correct me if I'm= wrong, but I believe we used the stock AWS version of Linux (Ubuntu maybe?= )

So our environment was pretty stock--we didn't get any = special support or features from AWS.

Best of luck with your = profiling and benchmarking.=C2=A0 Do let us know how you perform.=C2=A0 Fli= nk looks like a pretty interesting project, and so let us know if we can he= lp y'all out in some way.

Thanks, George


On Sun, Jul 12, 2015 at 11:12 PM, Hawin Jiang <hawin.jiang@gmail.co= m> wrote:

Hi Michael and George

=C2=A0

First of all, congratulation you guys have won the s= ort game again.=C2=A0 We are coming from Flink community. =C2=A0<= /u>

I am not sure if = it is possible to get your test environment to test our Flink for free. =C2= =A0we saw that Apache spark did a good job as well.=C2=A0

We want = to challenge your records. But we don=E2=80=99t have that much servers for = testing.

Please let me know if you can help us or not. <= /u>

Thank you very much.

=C2=A0

=C2=A0

=C2=A0

Best regards=

Hawin

=





--089e0122a5b8ccffd3051af9dfb6--