Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B7C16200B9D for ; Thu, 13 Oct 2016 14:38:53 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B644D160AE4; Thu, 13 Oct 2016 12:38:53 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AFBFA160AE3 for ; Thu, 13 Oct 2016 14:38:52 +0200 (CEST) Received: (qmail 12643 invoked by uid 500); 13 Oct 2016 12:38:46 -0000 Mailing-List: contact users-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@nifi.apache.org Delivered-To: mailing list users@nifi.apache.org Received: (qmail 12632 invoked by uid 99); 13 Oct 2016 12:38:46 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Oct 2016 12:38:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 734991A02A9 for ; Thu, 13 Oct 2016 12:38:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.668 X-Spam-Level: ** X-Spam-Status: No, score=2.668 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_FONT_FACE_BAD=0.289, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id U5NTXzS1F7gd for ; Thu, 13 Oct 2016 12:38:44 +0000 (UTC) Received: from mail-oi0-f42.google.com (mail-oi0-f42.google.com [209.85.218.42]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E5D095FC3A for ; Thu, 13 Oct 2016 12:38:43 +0000 (UTC) Received: by mail-oi0-f42.google.com with SMTP id d132so96414127oib.2 for ; Thu, 13 Oct 2016 05:38:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=R+3ndspu5EfHxRSWaoSvTrnlGAjefHsSPuItnxBA+Eo=; b=xVc8K2pZVVnSROm2oeu4AspkI9Ti8/5pFCaWgAKr/4c4RwkFuSo/VL02Y2YyluoqZX HMaao7rluuu0MTG7Bfx+4CdeAZmiaJBwFF59gLUfgddK+yV6dwSLo6k0VZIXwAPK23zZ atstw/AOcwQ9T+e02PLaocUajR732GNZDSZglRexHTUmKUmjFQwfjpoxvjdwEiQisBN6 4Hf4hWp5xddtBDKp3qlSIi8tlDEXMHd4oJlEXwTzH3lkT+poU9saWtvRuFL/jKgXeXR0 W+wB+ala90omWUUFzPgvJ5caFfzaiLnmFwHg8qxWEpCtLla0q1GODB7r/XyQ/5m8L7Ux tUHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=R+3ndspu5EfHxRSWaoSvTrnlGAjefHsSPuItnxBA+Eo=; b=UahcRK0fwxHdPU5pyCkl3+ypImnYp/rMqkV6H6FS/cvh6T1vyR2ML+k2C2K0vCpg36 iIaSx8xCedf2o+FTBpR3kRTcH9lEMLPtAo3AJlCBgwllAyEXkE4nnaXZth1NaA+SZebK K8YVknC/C/w42xNt4lgOQ44aS9g1nbacS0dNHO2YsYxb6KQAcK7kkek1hXzg0HKCwXUz PmO8aVs7gV0GthVZB03JQsLSv/uTIzH5q344dqPlN+3gW7SEaTMx5hEmQFsgt8YHWr4C P5UPLlIToDTuBKkpArvYH6Et1Lvg+Mv0PXQKMms0JaICPy8UvpOjpVarC+ooA3Eybwak 31+A== X-Gm-Message-State: AA6/9Rl54/4ZZmrPL/0Txx8c5V8E6zB2sf1p7K4yjnENsEWI4y5YDZZfHxvPCwAyWASh1dXcUqPhuANvUbe98Q== X-Received: by 10.157.40.142 with SMTP id s14mr2960226ota.68.1476362317161; Thu, 13 Oct 2016 05:38:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.73.6 with HTTP; Thu, 13 Oct 2016 05:38:36 -0700 (PDT) In-Reply-To: References: From: Ali Nazemian Date: Thu, 13 Oct 2016 23:38:36 +1100 Message-ID: Subject: Re: Nifi hardware recommendation To: users@nifi.apache.org Content-Type: multipart/alternative; boundary=001a113d020e79692c053ebe6374 archived-at: Thu, 13 Oct 2016 12:38:53 -0000 --001a113d020e79692c053ebe6374 Content-Type: text/plain; charset=UTF-8 Thank you very much. I would be more than happy to provide some benchmark results after the implementation. Sincerely yours, Ali On Thu, Oct 13, 2016 at 11:32 PM, Joe Witt wrote: > Ali, > > I agree with your assumption. It would be great to test that out and > provide some numbers but intuitively I agree. > > I could envision certain scatter/gather data flows that could challenge > that sequential access assumption but honestly with how awesome disk > caching is in Linux these days in think practically speaking this is the > right way to think about it. > > Thanks > Joe > > On Thu, Oct 13, 2016 at 8:29 AM, Ali Nazemian > wrote: > >> Dear Joe, >> >> Thank you very much. That was a really great explanation. >> I investigated the Nifi architecture, and it seems that most of the >> read/write operations for flow file repo and provenance repo are random. >> However, for content repo most of the read/write operations are sequential. >> Let's say cost does not matter. In this case, even choosing SSD for content >> repo can not provide huge performance gain instead of HDD. Am I right? >> Hence, it would be better to spend content repo SSD money on network >> infrastructure. >> >> Best regards, >> Ali >> >> On Thu, Oct 13, 2016 at 10:22 PM, Joe Witt wrote: >> >>> Ali, >>> >>> You have a lot of nice resources to work with there. I'd recommend the >>> series of RAID-1 configuration personally provided you keep in mind this >>> means you can only lose a single disk for any one partition. As long as >>> they're being monitored and would be quickly replaced this in practice >>> works well. If there could be lapses in monitoring or time to replace then >>> it is perhaps safer to go with more redundancy or an alternative RAID type. >>> >>> I'd say do the OS, app installs w/user and audit db stuff, application >>> logs on one physical RAID volume. Have a dedicated physical volume for the >>> flow file repository. It will not be able to use all the space but it >>> certainly could benefit from having no other contention. This could be a >>> great thing to have SSDs for actually. And for the remaining volumes split >>> them up for content and provenance as you have. You get to make the >>> overall performance versus retention decision. Frankly, you have a great >>> system to work with and I suspect you're going to see excellent results >>> anyway. >>> >>> Conservatively speaking expect say 50MB/s of throughput per volume in >>> the content repository so if you end up with 8 of them could achieve >>> upwards of 400MB/s sustained. You'll also then want to make sure you have >>> a good 10G based network setup as well. Or, you could dial back on the >>> speed tradeoff and simply increase retention or disk loss tolerance. Lots >>> of ways to play the game. >>> >>> There are no published SSD vs HDD performance benchmarks that I am aware >>> of though this is a good idea. Having a hybrid of SSDs and HDDs could >>> offer a really solid performance/retention/cost tradeoff. For example >>> having SSDs for the OS/logs/provenance/flowfile with HDDs for the content - >>> that would be quite nice. At that rate to take full advantage of the >>> system you'd need to have very strong network infrastructure between NiFi >>> and any systems it is interfacing with and your flows would need to be >>> well tuned for GC/memory efficiency. >>> >>> Thanks >>> Joe >>> >>> On Thu, Oct 13, 2016 at 2:50 AM, Ali Nazemian >>> wrote: >>> >>>> Dear Nifi Users/ developers, >>>> Hi, >>>> >>>> I was wondering is there any benchmark about the question that is it >>>> better to dedicate disk control to Nifi or using RAID for this purpose? For >>>> example, which of these scenarios is recommended from the performance point >>>> of view? >>>> Scenario 1: >>>> 24 disk in total >>>> 2 disk- raid 1 for OS and fileflow repo >>>> 2 disk- raid 1 for provenance repo1 >>>> 2 disk- raid 1 for provenance repo2 >>>> 2 disk- raid 1 for content repo1 >>>> 2 disk- raid 1 for content repo2 >>>> 2 disk- raid 1 for content repo3 >>>> 2 disk- raid 1 for content repo4 >>>> 2 disk- raid 1 for content repo5 >>>> 2 disk- raid 1 for content repo6 >>>> 2 disk- raid 1 for content repo7 >>>> 2 disk- raid 1 for content repo8 >>>> 2 disk- raid 1 for content repo9 >>>> >>>> >>>> Scenario 2: >>>> 24 disk in total >>>> 2 disk- raid 1 for OS and fileflow repo >>>> 4 disk- raid 10 for provenance repo1 >>>> 18 disk- raid 10 for content repo1 >>>> >>>> Moreover, is there any benchmark for SSD vs HDD performance for Nifi? >>>> Thank you very much. >>>> >>>> Best regards, >>>> Ali >>>> >>> >>> >> >> >> -- >> A.Nazemian >> > > -- A.Nazemian --001a113d020e79692c053ebe6374 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thank you very much.=C2=A0
I would be more than happy = to provide some benchmark results after the implementation.=C2=A0
Sincerely yours,
Ali

On Thu, Oct 13, 2016 at 11:32 PM= , Joe Witt <joe.witt@gmail.com> wrote:
Ali,

I agree with your as= sumption.=C2=A0 It would be great to test that out and provide some numbers= but intuitively I agree.

I could envision certain= scatter/gather data flows that could challenge that sequential access assu= mption but honestly with how awesome disk caching is in Linux these days in= think practically speaking this is the right way to think about it.
<= div>
Thanks
Joe

On Thu, Oct 13,= 2016 at 8:29 AM, Ali Nazemian <alinazemian@gmail.com> w= rote:
Dear Joe,

=
Thank you very much. That was a really great explanation.=C2=A0
<= div>I investigated the Nifi architecture, and it seems that most of the rea= d/write operations for flow file repo and provenance repo are random. Howev= er, for content repo most of the read/write operations are sequential. Let&= #39;s say cost does not matter. In this case, even choosing SSD for content= repo can not provide huge performance gain instead of HDD. Am I right? Hen= ce, it would be better to spend content repo SSD money on network infrastru= cture.

Best regards,
Ali

On Thu, Oct 13, 2016 at 10:22 P= M, Joe Witt <joe.witt@gmail.com> wrote:
Ali,

You have a lot of n= ice resources to work with there.=C2=A0 I'd recommend the series of RAI= D-1 configuration personally provided you keep in mind this means you can o= nly lose a single disk for any one partition.=C2=A0 As long as they're = being monitored and would be quickly replaced this in practice works well.= =C2=A0 If there could be lapses in monitoring or time to replace then it is= perhaps safer to go with more redundancy or an alternative RAID type.

I'd say do the OS, app installs w/user and audit d= b stuff, application logs on one physical RAID volume.=C2=A0 Have a dedicat= ed physical volume for the flow file repository.=C2=A0 It will not be able = to use all the space but it certainly could benefit from having no other co= ntention.=C2=A0 This could be a great thing to have SSDs for actually.=C2= =A0 And for the remaining volumes split them up for content and provenance = as you have.=C2=A0 You get to make the overall performance versus retention= decision.=C2=A0 Frankly, you have a great system to work with and I suspec= t you're going to see excellent results anyway.

Conservatively speaking expect say 50MB/s of throughput per volume in the= content repository so if you end up with 8 of them could achieve upwards o= f 400MB/s sustained.=C2=A0 You'll also then want to make sure you have = a good 10G based network setup as well.=C2=A0 Or, you could dial back on th= e speed tradeoff and simply increase retention or disk loss tolerance.=C2= =A0 Lots of ways to play the game.

There are no pu= blished SSD vs HDD performance benchmarks that I am aware of though this is= a good idea.=C2=A0 Having a hybrid of SSDs and HDDs could offer a really s= olid performance/retention/cost tradeoff.=C2=A0 For example having SSDs for= the OS/logs/provenance/flowfile with HDDs for the content - that would be = quite nice.=C2=A0 At that rate to take full advantage of the system you'= ;d need to have very strong network infrastructure between NiFi and any sys= tems it is interfacing with =C2=A0and your flows would need to be well tune= d for GC/memory efficiency.

Thanks
Joe=C2=A0

On Thu, Oct 13, 2016 at 2:50 AM, Ali Nazemian <alinazemian@gmail.c= om> wrote:
Dear Nifi Users/ developers,
Hi,

I was wonder= ing is there any benchmark about the question that is it better to dedicate= disk control to Nifi or using RAID for this purpose? For example, which of= these scenarios is recommended from the performance point of view?=C2=A0
Scenario 1:=C2=A0
24 disk in total
2 disk- rai= d 1 for OS and fileflow repo
2 disk- raid 1 for provenance repo1<= /div>
2 disk- raid 1 for provenance repo2
2 disk- raid 1 for = content repo1
2 disk- raid 1 for content repo2
2 disk- = raid 1 for content repo3
2 disk- raid 1 for content repo4
2 disk- raid 1 for content repo5
2 disk- raid 1 for conte= nt repo6
2 disk- raid 1 for content repo7
2 disk- raid = 1 for content repo8
2 disk- raid 1 for content repo9


Scenario 2:=C2=A0
24 disk in total
2 disk- raid 1 for OS and=C2=A0fileflow=C2=A0re= po
4 disk- raid 10 for provenance repo1
18 disk- raid 1= 0 for content repo1

Moreover, is there any b= enchmark for SSD vs HDD performance for Nifi?
Thank you very much= .

Best regards,
Ali




<= /div>--=
A.Nazemian




--
=
A.Nazemia= n
--001a113d020e79692c053ebe6374--