Return-Path: Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: (qmail 70849 invoked from network); 6 Jan 2011 16:02:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Jan 2011 16:02:24 -0000 Received: (qmail 66295 invoked by uid 500); 6 Jan 2011 16:02:23 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 66139 invoked by uid 500); 6 Jan 2011 16:02:22 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 66130 invoked by uid 99); 6 Jan 2011 16:02:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jan 2011 16:02:21 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.210.176] (HELO mail-iy0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jan 2011 16:02:16 +0000 Received: by iyb26 with SMTP id 26so17778809iyb.35 for ; Thu, 06 Jan 2011 08:01:55 -0800 (PST) Received: by 10.231.208.17 with SMTP id ga17mr15668967ibb.121.1294329715579; Thu, 06 Jan 2011 08:01:55 -0800 (PST) MIME-Version: 1.0 Received: by 10.231.115.8 with HTTP; Thu, 6 Jan 2011 08:01:35 -0800 (PST) In-Reply-To: <4D2592EF.1060003@above.net.tw> References: <4D253F32.2010200@above.net.tw> <4D2592EF.1060003@above.net.tw> From: Todd Lipcon Date: Thu, 6 Jan 2011 08:01:35 -0800 Message-ID: Subject: Re: response time increase?? To: hdfs-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=90e6ba5bc923eb62bc04992f9e18 --90e6ba5bc923eb62bc04992f9e18 Content-Type: text/plain; charset=Big5 Content-Transfer-Encoding: quoted-printable Well, technically, 1st_arrival_time is O(number of network hops between racks) which is O(#racks) and obviously #racks is a function of the number of datanodes. But you're talking an extra 0.1ms or so there per network hop= , and there isn't any system which could avoid it. But HDFS-wise, the data structures are mostly hashtable based. -Todd On Thu, Jan 6, 2011 at 2:01 AM, KevinKuei wrote: > Dear Todd, > > Thanks for your reply. > Let me clarify my "response time" definition. > > *start_time: time of request send > *1st_arrival_time: the 1st block of data been received > *completed_time: the final block of data been received > > response time =3D 1st_arrival_time =3D start_time > > Are you still confirmed that there is NO "response_time =3D O(number of > datanode)" ? > If so, HDFS will be great for our application. Thanks!! > > > -- > Kevin Kuei > > > =A9=F3 2011/1/6 =A4U=A4=C8 04:23, Todd Lipcon =B4=A3=A8=EC: > > Hi Kevin, > > No, there is no O(number of datanodes) factor in performance. > > -Todd > > On Wed, Jan 5, 2011 at 8:04 PM, KevinKuei wrote: > >> Hi, >> >> I'm planing for a youtube-like video online site and looking for a >> suitable file system. >> The high performance and reliability of HDFS seems to be the great >> candidate. >> >> But somebody told me that the response time will be linear increased wit= h >> more data node. >> I've no enough hardwares to do the test for now. >> >> It will be very appreciated if anyone can provide such information. >> >> Thanks!! >> >> -- >> Kevin Kuei >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. > > > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. > --=20 Todd Lipcon Software Engineer, Cloudera --90e6ba5bc923eb62bc04992f9e18 Content-Type: text/html; charset=Big5 Content-Transfer-Encoding: quoted-printable Well, technically, 1st_arrival_time is O(number of network hops between rac= ks) which is O(#racks) and obviously #racks is a function of the number of = datanodes. But you're talking an extra 0.1ms or so there per network ho= p, and there isn't any system which could avoid it.
 
But HDFS-wise, the data structures are mostly hashtable ba= sed.

-Todd

On Th= u, Jan 6, 2011 at 2:01 AM, KevinKuei <kkuei@above.net.tw> wrote:
=20 =20 =20
Dear Todd,

Thanks for your reply.
Let me clarify my "response time" definition.

*start_time: time of request send
*1st_arrival_time: the 1st block of data been received
*completed_time: the final block of data been received

response time =3D 1st_arrival_time =3D start_time

Are you still confirmed that  there is NO  "response_tim= e =3D O(number of datanode)" ?
If so, HDFS will be great for our application.  Thanks!!


--
Kevin Kuei


=A9=F3 2011/1/6 =A4U=A4=C8 04:23, Todd Lipcon =B4=A3=A8=EC:
Hi K= evin,

No, there is no O(number of datanodes) factor in performance.

-Todd

On Wed, Jan 5, 2011 at 8:04 PM, KevinKuei <kkuei@above.net.tw> wrote:
Hi,

I'm planing for a youtube-like video online site and lookin= g for a suitable file system.
The high performance and reliability of HDFS seems to be the great candidate.

But somebody told me that the response time will be linear increased with more data node.
I've no enough hardwares to do the test for now.

It will be very appreciated if anyone can provide such information.

Thanks!!

--
Kevin Kuei

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




--
Todd Lipcon
Software Engineer, Cloudera

--
This message has been scanned for viruses and
dangerous content by MailSca= nner, and is
believed to be clean.


--=20
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



--
Todd Lipcon=
Software Engineer, Cloudera
--90e6ba5bc923eb62bc04992f9e18--