Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C1A592EA2 for ; Thu, 5 May 2011 21:27:32 +0000 (UTC) Received: (qmail 68090 invoked by uid 500); 5 May 2011 21:27:31 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 68064 invoked by uid 500); 5 May 2011 21:27:31 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 68056 invoked by uid 99); 5 May 2011 21:27:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2011 21:27:31 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.210.169] (HELO mail-iy0-f169.google.com) (209.85.210.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2011 21:27:23 +0000 Received: by iyh42 with SMTP id 42so3176630iyh.14 for ; Thu, 05 May 2011 14:27:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.115.193 with SMTP id l1mr1546145icq.472.1304630803496; Thu, 05 May 2011 14:26:43 -0700 (PDT) Received: by 10.42.221.136 with HTTP; Thu, 5 May 2011 14:26:43 -0700 (PDT) X-Originating-IP: [173.165.132.158] Date: Thu, 5 May 2011 15:26:43 -0600 Message-ID: Subject: Considerations for using HBase in User Facing applications From: Matt Davies To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=20cf303bff4a9afe6604a28e07a7 --20cf303bff4a9afe6604a28e07a7 Content-Type: text/plain; charset=ISO-8859-1 Afternoon everyone, I am researching what the best practice is for using HBase in user facing applications. I do not know all of the applications that will be ported to use HBase, but they do share common characteristics such as - simple key / value data. Not serving large files ATM. Perhaps a couple columns in a single column family - very tall tables - hundreds of millions of rows - need millisecond access times for a single row - random access - maintain very, very good query times while loading in new data The quick choice would be to use something like memcache or Redis, but the data is growing faster than the memory of a single box or even few boxes. We also have a significant investment in Hadoop technologies so keeping HBase prime seems to make a lot of sense. So, some questions: 1. do you find that having a single HBase cluster to serve all applications vs smaller clusters to serve application specific data is better? 2. In the real world do people hook API's directly to HBase or is there some caching layer that is used? 3. I remember hearing people like StumbleUpon use different clusters for analytics vs customer apps. Is this still best practice? 4. Anyone using MSLAB's to reduce GC pauses in production? Experiences / landmines? 5. What other considerations have you found when hooking HBase up for user-facing applications? Thanks in advance and I'd love to hear some bragging! -Matt --20cf303bff4a9afe6604a28e07a7--