Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@locus.apache.org Received: (qmail 41118 invoked from network); 24 Feb 2008 02:20:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Feb 2008 02:20:04 -0000 Received: (qmail 76501 invoked by uid 500); 24 Feb 2008 02:19:58 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 76406 invoked by uid 500); 24 Feb 2008 02:19:58 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 76382 invoked by uid 99); 24 Feb 2008 02:19:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Feb 2008 18:19:58 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 Feb 2008 02:19:33 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8118B234C010 for ; Sat, 23 Feb 2008 18:19:19 -0800 (PST) Message-ID: <1237458700.1203819559527.JavaMail.jira@brutus> Date: Sat, 23 Feb 2008 18:19:19 -0800 (PST) From: "Jim Kellerman (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Updated: (HBASE-76) [hbase] performance: Try to purge servers of Text MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HBASE-76: ------------------------------- Attachment: TextVsString.java Here is a little test program I wrote to test the speed of serialization of Text vs String. While String is a little slower than Text, it isn't by much. String also has the advantage of being immutable once created. Test results: Serialized 1000000 Strings in 547 milliseconds Deserialized 1000000 Strings in 860 milliseconds Serialized 1000000 Text objects in 531 milliseconds Deserialized 1000000 Text objects in 500 milliseconds > [hbase] performance: Try to purge servers of Text > ------------------------------------------------- > > Key: HBASE-76 > URL: https://issues.apache.org/jira/browse/HBASE-76 > Project: Hadoop HBase > Issue Type: Improvement > Components: regionserver > Reporter: stack > Priority: Minor > Attachments: TextVsString.java > > > Chatting with Jim while looking at profiler outputs, we should make an effort at purging the servers of the Text type so HRegionServer doesn't ever have to deal in Characters and the concomitant encode/decode to UTF-8. Toward this end, we'd make changes like moving HStoreKey to have four rather than 3 data members: column family, column family qualifier, row + timestamp done as a basic Writable -- ImmutableBytesWritable? -- and a long rather than a Text column, Text row and a timestamp long. This would save on our having to do the relatively expensive 'find' of the column family separator inside in extractFamily (>10% of CPU scanning). Chatting about it, we could effect the change without change in the public client API; clients could continue to take Text type for row and column and then client-side, the convertion to HStoreKey could be done before crossing the wire to the server. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.