Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@locus.apache.org Received: (qmail 61838 invoked from network); 26 Aug 2008 22:55:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Aug 2008 22:55:37 -0000 Received: (qmail 52216 invoked by uid 500); 26 Aug 2008 22:55:35 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 52200 invoked by uid 500); 26 Aug 2008 22:55:35 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 52189 invoked by uid 99); 26 Aug 2008 22:55:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Aug 2008 15:55:34 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ning.li.00@gmail.com designates 209.85.200.171 as permitted sender) Received: from [209.85.200.171] (HELO wf-out-1314.google.com) (209.85.200.171) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Aug 2008 22:54:38 +0000 Received: by wf-out-1314.google.com with SMTP id 24so2344973wfg.2 for ; Tue, 26 Aug 2008 15:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=XvAopThWMMIMgzW9R6aonH+Ojv5Ub7Dbxd9UyhhWrUw=; b=cTGTL94eS/8IMUAI8gTr6gVgRkvvWNNYZDlLX6Hnp8AgHR5Za0X3l0K1CGbXujajKU A6q81CLbOHlddJOAqC1zlCFV/sMonhQ1CDm9RvbtBUjdIM7QSuvo46SgEJCBDd1AmJAe UZOkhmPv46KrpRfpctqvkXMOpLAAD6s9nu4uA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=X3HMBpWUCMvURbrMiz6EnWssgdi6jR0GalO5vQLFdwKa3YQnFeiq7GYGHUijKfn28z qsj8NXkPPRilAgyAXlNYHlCs0EgK/2SEznlRZV00ziKZLYcryjHArHMec8L+OkRtIxCg 7ukFvaa6RPD6omomDvksteWXeK0hqOt9sfhRk= Received: by 10.142.232.20 with SMTP id e20mr2209832wfh.134.1219791308722; Tue, 26 Aug 2008 15:55:08 -0700 (PDT) Received: by 10.142.139.21 with HTTP; Tue, 26 Aug 2008 15:55:08 -0700 (PDT) Message-ID: Date: Tue, 26 Aug 2008 18:55:08 -0400 From: "Ning Li" To: hbase-user@hadoop.apache.org Subject: Re: Multi get/put In-Reply-To: <4898C35E.2020405@duboce.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <7e536b1f0807280238n279a91a0n35119f908891b959@mail.gmail.com> <31a243e70807281136y7424e72dtc7d03e08022b09fe@mail.gmail.com> <7e536b1f0807290703j53807528nca009d8698d148b8@mail.gmail.com> <4897727E.4080607@duboce.net> <4898C35E.2020405@duboce.net> X-Virus-Checked: Checked by ClamAV on apache.org Some follow-up on the performance issues: > > PERFORMANCE ISSUES > > Our preliminary performance experiments show that the performance > > of building an index is quite reasonable. However, the performance of > > random reads in HDFS is so poor that the search performance is > > dramatically worse than that on local file systems. > > > What do you mean by 'dramatic' in the above? This is a sweet feature. That > its slow on first implementation is OK. Are you thinking its so slow, its > not functional? On local FS, real disk IO is expensive. Lucene relies on FS cache to provide high search performance on local FS. Because of this, the following comparisons are based on warm test results. The comparison is between the local FS and a one-node HDFS. HDFS provides high sequential read performance but poor random read performance mainly because of socket overhead when data is warm. On HDFS 0.17.1, the search performance is more than an order of magnitude slower than that on a local FS. Even with reusing socket connection, the search performance is still about an order of magnitude slower. Since this is caused by the socket overhead in HDFS, you see similar results with random reads on a map file. I used HBase's MapFilePerformanceEvaluation. The random read performance is a bit less than 7 times lower than that on a local FS. This is a bit better than the search performance probably because a random read on a map file is several almost-sequential reads on the data file in HDFS. Given the above, would the search performance be acceptable? PS: I saw on http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation that the random read performance on a map file improved quite a bit from 0.17.1 to 0.18.0. Any insight?