hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1649) Performance regression with Block CRCs
Date Tue, 31 Jul 2007 23:01:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516819
] 

Raghu Angadi commented on HADOOP-1649:
--------------------------------------


TestDFSIO is a simpler test. After analyzing files written during DFSIO-write test, it looks
like just handful of slow nodes (disk or network) slowdown the over all job. From namenode
logs, time take to write a 320 MB file on 500 nodes varies from 26 sec to 380 sec (on one
of the  runs with avg of 75 sec).  I will look at time taken to write these files during sort.

For writes, Hadoop can work around slow nodes problem by avoiding nodes that have many pending
writes inside chooseTarget. Since we don't keep track of reads, adaptively avoiding slow nodes
is harder. But this problem is more severe for writes. Also once we write less to a node,
we will end up reading less as well.



> Performance regression with Block CRCs
> --------------------------------------
>
>                 Key: HADOOP-1649
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1649
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1649.patch
>
>
> Performance is noticeably affected by Block Level CRCs patch (HADOOP-1134). This is more
noticeable on writes (randomriter test etc). 
> With random writer, it takes 20-25% on small cluster (20 nodes) and many be 10% on larger
cluster. 
> There are a few differences in how data is written with 1134. As soon as I can reproduce
this, I think it will be easier to fix. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message