Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0C5371878B for ; Thu, 2 Jul 2015 09:19:43 +0000 (UTC) Received: (qmail 81319 invoked by uid 500); 2 Jul 2015 09:19:32 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 81196 invoked by uid 500); 2 Jul 2015 09:19:32 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 81184 invoked by uid 99); 2 Jul 2015 09:19:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jul 2015 09:19:31 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of capri.himanshu@gmail.com designates 209.85.220.177 as permitted sender) Received: from [209.85.220.177] (HELO mail-qk0-f177.google.com) (209.85.220.177) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jul 2015 09:17:16 +0000 Received: by qkhu186 with SMTP id u186so47478346qkh.0 for ; Thu, 02 Jul 2015 02:18:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=M2aDNBuQrtzKvt+c+VgFgR6bk4Z5cvH3hOZdnbelVcg=; b=TtAHprq+3x0Q1bjagmqels5p7++Pfa9Kh+djDbNjWRvEBFT1+M+V12eowDaB9kTo6Z b6gEtoGEJLcjUTeueFxL8PwFmY5M3nKDWR8AFAZIJGok7y3quYUEK0r0TXsDxAjHXZv8 JY0PB8h3/7ZVPZZE6t/6JiQSiHYtZPP1KeK1taeiXOH8E2xOXm0Gr7KYz1BGZe0/qkC+ 5xBGK4sQKCpj8CNkHdPoRBs26U63Z2rSeruZTxDtW1NpmZp6Kh+XDutqwwb+zjjnFlLY U885f7yyBJOoi91TySq2/Mj3ZJgVM5f8N4nRCOshLWMX8mI4WhfstBQVzioMCFEL0maO 5efg== MIME-Version: 1.0 X-Received: by 10.55.51.129 with SMTP id z123mr61661956qkz.92.1435828698445; Thu, 02 Jul 2015 02:18:18 -0700 (PDT) Received: by 10.140.101.42 with HTTP; Thu, 2 Jul 2015 02:18:18 -0700 (PDT) Received: by 10.140.101.42 with HTTP; Thu, 2 Jul 2015 02:18:18 -0700 (PDT) In-Reply-To: References: Date: Thu, 2 Jul 2015 14:48:18 +0530 Message-ID: Subject: Scalability of Name Node for external clients From: Himanshu Jindal To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a1149013e877e1e0519e0ebd2 X-Virus-Checked: Checked by ClamAV on apache.org --001a1149013e877e1e0519e0ebd2 Content-Type: text/plain; charset=UTF-8 I have a question regarding scalability of name node. Typically the name node handles 2 type of clients: 1. Internal clients (data nodes - part of the hadoop cluster) 2. External clients (client nodes requesting for block locations in order to perform read/writes on data nodes) I am not much concerned about the throughput of Internal clients, However I am more worried about throughput of the external clients. So what is expected throughput of operations on name-node for external clients and how scalable it is? To be more precise, Please look at following example: There is a typical Name Node server running a cluster of 100 data nodes. Now assuming the Internal clients use default block reports and heartbeat requests, I have following questions regarding scalability of the NameNode: 1. What is number of simultaneous external clients connections the Name Node can support? (a hundred thousands?) 2. What is the number of operations (get block locations) served per second? 3. What are different ways to increase the throughput for these external clients? Thanks Himanshu --001a1149013e877e1e0519e0ebd2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

I have a question regarding scalability of name node. Typica= lly the name node handles 2 type of clients:
1. Internal clients (data nodes - part of the hadoop cluster)
2. External clients (client nodes requesting for block locations in order t= o perform read/writes on data nodes)

I am not much concerned about the throughput of Internal cli= ents, However I am more worried about throughput of the external clients. S= o what is expected throughput of operations on name-node for external clien= ts and how scalable it is? To be more precise, Please look at following exa= mple:

There is a typical Name Node server running a cluster of 100= data nodes. Now assuming the Internal clients use default block reports an= d heartbeat requests, I have following questions regarding scalability of t= he NameNode:
1. What is number of simultaneous external clients connections the Name Nod= e can support? (a hundred thousands?)
2. What is the number of operations (get block locations) served per second= ?
3. What are different ways to increase the throughput for these external cl= ients?

Thanks
Himanshu

--001a1149013e877e1e0519e0ebd2--