hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Isaacson <...@cloudera.com>
Subject Re: DFS respond very slow
Date Tue, 16 Oct 2012 01:56:15 GMT
Also, note that JVM startup overhead, etc, means your -ls time is not
completely unreasonable. Using OpenJDK on a cluster of VMs, my "hdfs
dfs -ls" takes 1.88 seconds according to time (and 1.59 seconds of
user CPU time).

I'd be much more concerned about your slow transfer times.  On the
same cluster, I can easily push 4 MB/sec even with only a 100MB file
using "hdfs dfs -put - foo.txt". And of course using distcp or
multiple -put workloads HDFS can saturate multiple GigE links.

-andy

On Mon, Oct 15, 2012 at 5:22 PM, Vinod Kumar Vavilapalli
<vinodkv@hortonworks.com> wrote:
> Try picking up a single operation say "hadoop dfs -ls" and start profiling.
>  - Time the client JVM is taking to start. Enable debug logging on the
> client side by exporting HADOOP_ROOT_LOGGER=DEBUG,CONSOLE
>  - Time between the client starting and the namenode audit logs showing the
> read request. Also enable debug logging on the daemons too.
>  - Also, you can wget the namenode web pages and see how fast they return.
>
> To repeat what is already obvious, It is most likely related to your network
> setup and/or configuration.
>
> Thanks,
> +Vinod
>
> On Oct 10, 2012, at 12:20 AM, Alexey wrote:
>
> ok, here you go:
> I have 3 servers:
> datanode on server 1, 2, 3
> namenode on server 1
> secondarynamenode on server 2
>
> all servers are at the hetzner datacenter and connected through 100Mbit
> link, pings between them about 0.1ms
>
> each server has 24Gb ram and intel core i7 3Ghz CPU
> disk is 700Gb RAID
>
> the bindings related configuration is the following:
> server 1:
> core-site.xml
> --------------------------------------
> <name>fs.default.name</name>
> <value>hdfs://5.6.7.11:8020</value>
> --------------------------------------
>
> hdfs-site.xml
> --------------------------------------
> <name>dfs.datanode.address</name>
> <value>0.0.0.0:50010</value>
>
> <name>dfs.datanode.http.address</name>
> <value>0.0.0.0:50075</value>
>
> <name>dfs.http.address</name>
> <value>5.6.7.11:50070</value>
>
> <name>dfs.secondary.https.port</name>
> <value>50490</value>
>
> <name>dfs.https.port</name>
> <value>50470</value>
>
> <name>dfs.https.address</name>
> <value>5.6.7.11:50470</value>
>
> <name>dfs.secondary.http.address</name>
> <value>5.6.7.12:50090</value>
> --------------------------------------
>
> server 2:
> core-site.xml
> --------------------------------------
> <name>fs.default.name</name>
> <value>hdfs://5.6.7.11:8020</value>
> --------------------------------------
>
> hdfs-site.xml
> --------------------------------------
> <name>dfs.datanode.address</name>
> <value>0.0.0.0:50010</value>
>
> <name>dfs.datanode.http.address</name>
> <value>0.0.0.0:50075</value>
>
> <name>dfs.http.address</name>
> <value>5.6.7.11:50070</value>
>
> <name>dfs.secondary.https.port</name>
> <value>50490</value>
>
> <name>dfs.https.port</name>
> <value>50470</value>
>
> <name>dfs.https.address</name>
> <value>5.6.7.11:50470</value>
>
> <name>dfs.secondary.http.address</name>
> <value>5.6.7.12:50090</value>
> --------------------------------------
>
> server 3:
> core-site.xml
> --------------------------------------
> <name>fs.default.name</name>
> <value>hdfs://5.6.7.11:8020</value>
> --------------------------------------
>
> hdfs-site.xml
> --------------------------------------
> <name>dfs.datanode.address</name>
> <value>0.0.0.0:50010</value>
>
> <name>dfs.datanode.http.address</name>
> <value>0.0.0.0:50075</value>
>
> <name>dfs.http.address</name>
> <value>127.0.0.1:50070</value>
>
> <name>dfs.secondary.https.port</name>
> <value>50490</value>
>
> <name>dfs.https.port</name>
> <value>50470</value>
>
> <name>dfs.https.address</name>
> <value>127.0.0.1:50470</value>
>
> <name>dfs.secondary.http.address</name>
> <value>5.6.7.12:50090</value>
> --------------------------------------
>
> netstat output:
> server 1
>
> tcp        0      0 5.6.7.11:8020           0.0.0.0:*               LISTEN
> 10870/java
>
> tcp        0      0 5.6.7.11:50070          0.0.0.0:*               LISTEN
> 10870/java
>
> tcp        0      0 0.0.0.0:50010           0.0.0.0:*               LISTEN
> 10997/java
>
> tcp        0      0 0.0.0.0:50075           0.0.0.0:*               LISTEN
> 10997/java
>
> tcp        0      0 0.0.0.0:50020           0.0.0.0:*               LISTEN
> 10997/java
>
>
> server 2
>
> tcp        0      0 0.0.0.0:50010           0.0.0.0:*               LISTEN
> 23683/java
>
> tcp        0      0 0.0.0.0:50075           0.0.0.0:*               LISTEN
> 23683/java
>
> tcp        0      0 0.0.0.0:50020           0.0.0.0:*               LISTEN
> 23683/java
>
> tcp        0      0 5.6.7.12:50090          0.0.0.0:*               LISTEN
> 23778/java
>
>
> server 3
>
> tcp        0      0 0.0.0.0:50010           0.0.0.0:*               LISTEN
> 894/java
>
> tcp        0      0 0.0.0.0:50075           0.0.0.0:*               LISTEN
> 894/java
>
> tcp        0      0 0.0.0.0:50020           0.0.0.0:*               LISTEN
> 894/java
>
>
> if I'm transferring big files between servers I'm getting about 9Mb/s
> and even 10Mb/s with rsync
>
> On 10/09/12 11:56 PM, Harsh J wrote:
>
> Hi,
>
>
> OK, can you detail your network infrastructure used here, and also
>
> make sure your daemons are binding to the right interfaces as well
>
> (use netstat to check perhaps)? What rate of transfer do you get for
>
> simple file transfers (ftp, scp, etc.)?
>
>
> On Wed, Oct 10, 2012 at 12:24 PM, Alexey <alexxoid@gmail.com> wrote:
>
> Hello Harsh,
>
>
> I notices such issues from the start.
>
> Yes, I mean dfs.balance.bandwidthPerSec property, I set this property to
>
> 5000000.
>
>
> On 10/09/12 11:50 PM, Harsh J wrote:
>
> Hey Alexey,
>
>
> Have you noticed this right from the start itself? Also, what exactly
>
> do you mean by "Limited replication bandwidth between datanodes -
>
> 5Mb." - Are you talking of dfs.balance.bandwidthPerSec property?
>
>
> On Wed, Oct 10, 2012 at 10:53 AM, Alexey <alexxoid@gmail.com> wrote:
>
> Additional info: I also tried to use openjdk instead of sun's - issue
>
> still persists
>
>
> On 10/09/12 03:12 AM, Alexey wrote:
>
> Hi,
>
>
> I have an issues with hadoop dfs, I have 3 servers (24Gb RAM on each).
>
> The servers are not overloaded, they just have hadoop installed. One
>
> have datanode and namenode, second - datanode only, third - datanode and
>
> secondarynamenode.
>
>
> Hadoop datanodes have a max memory limit 8Gb. Default replication factor
>
> - 2. Limited replication bandwidth between datanodes - 5Mb.
>
>
> I've setupped hadoop to communicate between nodes by IP address.
>
> Everything is works - I can read/write files on each datanode, etc. But
>
> the issue is that hadoop dfs commands are executing very slow, even
>
> "hadoop dfs -ls /" takes about 3 seconds to execute, but it have only
>
> one folder /user in it.
>
> Files are also uploading to the hdfs very slow - hundreds kilobytes/second.
>
>
> I'm using Debian stable x86-64 distribution and hadoop running through
>
> sun-java6-jdk 6.26-0squeeze1
>
>
> Please give me any suggestions what I need to adjust/check to arrange
>
> this issue.
>
>
> As I said before - overall hdfs configuration is correct, because
>
> everything works except performance.
>
>
> --
>
> Best regards
>
> Alexey
>
>
>
> --
>
> Best regards
>
> Alexey
>
>
>
>
>
> --
>
> Best regards
>
> Alexey
>
>
>
>
>
> --
> Best regards
> Alexey
>
>

Mime
View raw message