knox-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Roberts <srobe...@hortonworks.com>
Subject Re: WebHDFS performance issue in Knox
Date Tue, 04 Sep 2018 08:53:30 GMT
Guang – This is somewhat to be expected.

When you talk to WebHDFS directly, the client can distribute the request across many data
nodes. Also, you are getting data directly from the source.
With Knox, all traffic goes through the single Knox host. Knox is responsible for fetching
from the datanodes and consolidating to send to you. This means overhead as it’s acting
as a middle man, and lower network capacity since only 1 host is serving data to you.

Also, if running on a cloud provider, the Knox host may be a smaller instance size with lower
network capacity.
--
Sean Roberts

From: Guang Yang <kobe@uber.com>
Reply-To: "user@knox.apache.org" <user@knox.apache.org>
Date: Tuesday, 4 September 2018 at 07:46
To: "user@knox.apache.org" <user@knox.apache.org>
Subject: WebHDFS performance issue in Knox

Hi,

We're using Knox 1.1.0 to proxy WebHDFS request. If we download a file through WebHDFS in
Knox, the download speed is just about 11M/s. However, if we download directly from datanode,
the speed is about 40M/s at least.

Are you guys aware of this problem? Any suggestion?

Thanks,
Guang
Mime
View raw message