Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CA215200B51 for ; Mon, 1 Aug 2016 11:47:41 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C6D29160A65; Mon, 1 Aug 2016 09:47:41 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BD795160A5D for ; Mon, 1 Aug 2016 11:47:40 +0200 (CEST) Received: (qmail 5647 invoked by uid 500); 1 Aug 2016 09:47:38 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 5636 invoked by uid 99); 1 Aug 2016 09:47:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Aug 2016 09:47:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 261361A094B for ; Mon, 1 Aug 2016 09:47:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id sT8dE7HFIAJI for ; Mon, 1 Aug 2016 09:47:36 +0000 (UTC) Received: from mail-lf0-f51.google.com (mail-lf0-f51.google.com [209.85.215.51]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id B04235F306 for ; Mon, 1 Aug 2016 09:47:35 +0000 (UTC) Received: by mail-lf0-f51.google.com with SMTP id l69so111476216lfg.1 for ; Mon, 01 Aug 2016 02:47:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=YksBMx0TJFuKtk94ps4QBq857gvDg+QxpFgtTOWdPXk=; b=LBPq4jjarNlVPPt2v3fAFCoht6zNdHyfkqSqBJIBhmOmlQ3JrLWUR2xDHBwzeLa7S1 66f2JXomltWLOyifJM6CaH01mBNfGup78Sfn3wMDV1xpvXI8jtyGdprMIZ0x1GQoX/Za vr2zC6UowxliSyr2fRUsIb1MXeuaddL6QswEAzqR+claXz1PpUA8OdFPWmR4GCr+MQ+r xJ6+IeQj0e123SU+hDkMasv5kxMDAXkPre7g4StjYCBdb/yxqoqeOcqRFX1QSOAsyiea ljZie1G1pupJ5kYhgEd2wPUz8+rxmR4udovnH8bP/Wr2OhsH9vSXskp5EiVjqlaSjiOd xtBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=YksBMx0TJFuKtk94ps4QBq857gvDg+QxpFgtTOWdPXk=; b=WhL8iygYWx0obYS7CdPQnTKh//yAdjZ742JskyDrUVkXlJarKEhJ3QXil7DCt6Y2GR FRkPi3n1bqPv+yylB4x1k3K9qM6+DjT0qqyyf7L4mHGt6U0pPKRZ/KkcOHDU1SQu55dT 03HuHtZmoTMkCcYZBSFtJPS+NDR1fD/bKaQ/WP6IXbE/JSYMKi26OXU35H/GfDa7LMLX 1CncCC/970d/cz9ObJPfKtS4C15tLvE+QHmuCT+XF6TVe7Z3IJmb0o2Cjm5G8LQLBi0T gGIzcLy2gk98UDjmZBiHGofLDrx4YZfkMydWCgkFjgfYNIwGXPGFFR9ywNl2p/nabUyf Kq4w== X-Gm-Message-State: AEkooutSh/n7W58DffTkDG1xdm4oXLkz4eHkHckHp3exf76vnYLUA5WvXniCFjhgHfopOZkxHBCsJiX042dICA== X-Received: by 10.25.138.5 with SMTP id m5mr16089362lfd.213.1470044849028; Mon, 01 Aug 2016 02:47:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dejan Menges Date: Mon, 01 Aug 2016 09:47:18 +0000 Message-ID: Subject: Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst To: Shady Xu Cc: Allen Wittenauer , user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a113fc3a007c7490538ff7d98 archived-at: Mon, 01 Aug 2016 09:47:42 -0000 --001a113fc3a007c7490538ff7d98 Content-Type: text/plain; charset=UTF-8 Hi Shady, We did extensive tests on this and received fix from Hortonworks which we are probably first and only to test most likely tomorrow evening. If Hortonworks guys are reading this maybe they know official HDFS ticket ID for this, if there is such, as I can not find it in our correspondence. Long story short - single server had RAID controllers with 1G and 2G cache (both scenarios were tested). It started just as a simple benchmark test using TestDFSIO after trying to narrow down best configuration on server side (discussions like this one, JBOD, RAID0, benchmarking etc). However, having 10-12 disks in a single server, and mentioned controllers, we got 6-10 times higher write speed when not using replication (meaning using replication factor one). Took really months to narrow it down to single hardcoded value in HdfsConstants.DEFAULT_DATA_SOCKET_SIZE (just looking into patch). In the end tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE) basically limited write speed to this constant when using replication, which is super annoying (specially in the context where more or less everyone is using now network speed bigger than 100Mbps). This can be found in b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java On Mon, Aug 1, 2016 at 11:39 AM Shady Xu wrote: > Thanks Allen. I am aware of the fact you said and am wondering what's the > await and svctm on your cluster nodes. If there are no signifiant > difference, maybe I should try other ways to tune my HBase. > > And Dejan, I've never heard of or noticed what you said. If that's true > it's really disappointing and please notice us if there's any progress. > > 2016-08-01 15:33 GMT+08:00 Dejan Menges : > >> Sorry for jumping in, but hence performance... it took as a while to >> figure out why, whatever disk/RAID0 performance you have, when it comes to >> HDFS and replication factor bigger then zero, disk write speed drops to >> 100Mbps... After long long tests with Hortonworks they found that issue is >> that someone at some point in history hardcoded stuff somewhere, and >> whatever setup you have, you were limited to this. Luckily we have quite >> powerful testing environment and plan is to test this patch later this >> week. I'm not sure if there's either official HDFS bug for this, checked >> our internal history but didn't see anything like that. >> >> This was quite disappointing, as whatever tuning, controllers, setups you >> do, it goes down the water with this. >> >> On Mon, Aug 1, 2016 at 8:30 AM Allen Wittenauer wrote: >> >>> >>> >>> On 2016-07-30 20:12 (-0700), Shady Xu wrote: >>> > Thanks Andrew, I know about the disk failure risk and that it's one of >>> the >>> > reasons why we should use JBOD. But JBOD provides worse performance >>> than >>> > RAID 0. >>> >>> It's not about failure: it's about speed. RAID0 performance will drop >>> like a rock if any one disk in the set is slow. When all the drives are >>> performing at peak, yes, it's definitely faster. But over time, drive >>> speed will decline (sometimes to half speed or less!) usually prior to a >>> failure. This failure may take a while, so in the mean time your cluster is >>> getting slower ... and slower ... and slower ... >>> >>> As a result, JBOD will be significantly faster over the _lifetime_ of >>> the disks vs. a comparison made _today_. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org >>> For additional commands, e-mail: user-help@hadoop.apache.org >>> >>> > --001a113fc3a007c7490538ff7d98 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Shady,

We did extensive tests on this and recei= ved fix from Hortonworks which we are probably first and only to test most = likely tomorrow evening. If Hortonworks guys are reading this maybe they kn= ow official HDFS ticket ID for this, if there is such, as I can not find it= in our correspondence. Long story short - single server had RAID controlle= rs with 1G and 2G cache (both scenarios were tested). It started just as a = simple benchmark test using TestDFSIO after trying to narrow down best conf= iguration on server side (discussions like this one, JBOD, RAID0, benchmark= ing etc). However, having 10-12 disks in a single server, and mentioned con= trollers, we got 6-10 times higher write speed when not using replication (= meaning using replication factor one). Took really months to narrow it down= to single hardcoded value in=C2=A0HdfsConstants.DEFAULT_DATA_SOCKET_SIZE (= just looking into patch). In the end=C2=A0tcpPeerServer.setReceiveBufferSiz= e(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE) basically limited write speed to = this constant when using replication, which is super annoying (specially in= the context where more or less everyone is using now network speed bigger = than 100Mbps). This can be found in=C2=A0b/hadoop-hdfs-project/hadoop-hdfs/= src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

On Mon, Aug 1, 2016 at 11:39 AM S= hady Xu <shadyxu@gmail.com> = wrote:
Thanks Alle= n. I am aware of the fact you said and am wondering what's the await an= d svctm on your cluster nodes. If there are no signifiant difference, maybe= I should try other ways to tune my HBase.

And Dejan, I&= #39;ve never heard of or noticed what you said. If that's true it's= really disappointing and please notice us if there's any progress.

2016-08-0= 1 15:33 GMT+08:00 Dejan Menges <dejan.menges@gmail.com>= :
Sorry for jumping in, = but hence performance... it took as a while to figure out why, whatever dis= k/RAID0 performance you have, when it comes to HDFS and replication factor = bigger then zero, disk write speed drops to 100Mbps... After long long test= s with Hortonworks they found that issue is that someone at some point in h= istory hardcoded stuff somewhere, and whatever setup you have, you were lim= ited to this. Luckily we have quite powerful testing environment and plan i= s to test this patch later this week. I'm not sure if there's eithe= r official HDFS bug for this, checked our internal history but didn't s= ee anything like that.=C2=A0

This was quite disappointin= g, as whatever tuning, controllers, setups you do, it goes down the water w= ith this.=C2=A0

On Mon, Aug 1, 2016 at 8:30 AM Allen Wittenauer <aw@apache.org> wrote:


On 2016-07-30 20:12 (-0700), Shady Xu <shadyxu@gmail.com> wrote:
> Thanks Andrew, I know about the disk failure risk and that it's on= e of the
> reasons why we should use JBOD. But JBOD provides worse performance th= an
> RAID 0.

It's not about failure: it's about speed.=C2=A0 RAID0 performance w= ill drop like a rock if any one disk in the set is slow. When all the drive= s are performing at peak, yes, it's definitely faster.=C2=A0 But over t= ime, drive speed will decline (sometimes to half speed or less!) usually pr= ior to a failure. This failure may take a while, so in the mean time your c= luster is getting slower ... and slower ... and slower ...

As a result, JBOD will be significantly faster over the _lifetime_ of the d= isks vs. a comparison made _today_.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


--001a113fc3a007c7490538ff7d98--