hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dejan Menges <dejan.men...@gmail.com>
Subject Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst
Date Tue, 02 Aug 2016 14:31:54 GMT
Hi Shady,

Great point, didn't know it. Thanks a lot, will definitely check if this
was only related to HWX distribution.

Thanks a lot, and sorry if I spammed this topic, it wasn't my intention at
all.

Dejan

On Tue, Aug 2, 2016 at 9:37 AM Shady Xu <shadyxu@gmail.com> wrote:

> Hi Dejan,
>
> I checked on Github and found that DEFAULT_DATA_SOCKET_SIZE locates in
> the hadoop-hdfs-project/hadoop-hdfs-client/ package in the apache version
> of Hadoop, whereas hadoop-hdfs-project/hadoop-hdfs/ in that of
> Hortonworks.   I am not sure if that means that parameter affects the
> performance of Hadoop client in Apache HDFS and the performance of DataNode
> in HortonWorks HDFS. If that's the fact, maybe it's a bug brought in by
> HortonWorks?
>
> 2016-08-01 17:47 GMT+08:00 Dejan Menges <dejan.menges@gmail.com>:
>
>> Hi Shady,
>>
>> We did extensive tests on this and received fix from Hortonworks which we
>> are probably first and only to test most likely tomorrow evening. If
>> Hortonworks guys are reading this maybe they know official HDFS ticket ID
>> for this, if there is such, as I can not find it in our correspondence.
>> Long story short - single server had RAID controllers with 1G and 2G cache
>> (both scenarios were tested). It started just as a simple benchmark test
>> using TestDFSIO after trying to narrow down best configuration on server
>> side (discussions like this one, JBOD, RAID0, benchmarking etc). However,
>> having 10-12 disks in a single server, and mentioned controllers, we got
>> 6-10 times higher write speed when not using replication (meaning using
>> replication factor one). Took really months to narrow it down to single
>> hardcoded value in HdfsConstants.DEFAULT_DATA_SOCKET_SIZE (just looking
>> into patch). In the
>> end tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE)
>> basically limited write speed to this constant when using replication,
>> which is super annoying (specially in the context where more or less
>> everyone is using now network speed bigger than 100Mbps). This can be found
>> in b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
>>
>> On Mon, Aug 1, 2016 at 11:39 AM Shady Xu <shadyxu@gmail.com> wrote:
>>
>>> Thanks Allen. I am aware of the fact you said and am wondering what's
>>> the await and svctm on your cluster nodes. If there are no signifiant
>>> difference, maybe I should try other ways to tune my HBase.
>>>
>>> And Dejan, I've never heard of or noticed what you said. If that's true
>>> it's really disappointing and please notice us if there's any progress.
>>>
>>> 2016-08-01 15:33 GMT+08:00 Dejan Menges <dejan.menges@gmail.com>:
>>>
>>>> Sorry for jumping in, but hence performance... it took as a while to
>>>> figure out why, whatever disk/RAID0 performance you have, when it comes to
>>>> HDFS and replication factor bigger then zero, disk write speed drops to
>>>> 100Mbps... After long long tests with Hortonworks they found that issue is
>>>> that someone at some point in history hardcoded stuff somewhere, and
>>>> whatever setup you have, you were limited to this. Luckily we have quite
>>>> powerful testing environment and plan is to test this patch later this
>>>> week. I'm not sure if there's either official HDFS bug for this, checked
>>>> our internal history but didn't see anything like that.
>>>>
>>>> This was quite disappointing, as whatever tuning, controllers, setups
>>>> you do, it goes down the water with this.
>>>>
>>>> On Mon, Aug 1, 2016 at 8:30 AM Allen Wittenauer <aw@apache.org> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 2016-07-30 20:12 (-0700), Shady Xu <shadyxu@gmail.com> wrote:
>>>>> > Thanks Andrew, I know about the disk failure risk and that it's
one
>>>>> of the
>>>>> > reasons why we should use JBOD. But JBOD provides worse performance
>>>>> than
>>>>> > RAID 0.
>>>>>
>>>>> It's not about failure: it's about speed.  RAID0 performance will drop
>>>>> like a rock if any one disk in the set is slow. When all the drives are
>>>>> performing at peak, yes, it's definitely faster.  But over time, drive
>>>>> speed will decline (sometimes to half speed or less!) usually prior to
a
>>>>> failure. This failure may take a while, so in the mean time your cluster
is
>>>>> getting slower ... and slower ... and slower ...
>>>>>
>>>>> As a result, JBOD will be significantly faster over the _lifetime_ of
>>>>> the disks vs. a comparison made _today_.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>>>>> For additional commands, e-mail: user-help@hadoop.apache.org
>>>>>
>>>>>
>>>
>

Mime
View raw message