Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 54F6BDD28 for ; Mon, 3 Sep 2012 19:21:00 +0000 (UTC) Received: (qmail 10833 invoked by uid 500); 3 Sep 2012 19:20:58 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 10725 invoked by uid 500); 3 Sep 2012 19:20:58 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 10712 invoked by uid 99); 3 Sep 2012 19:20:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Sep 2012 19:20:58 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.210.169] (HELO mail-iy0-f169.google.com) (209.85.210.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Sep 2012 19:20:52 +0000 Received: by iagk10 with SMTP id k10so10286095iag.14 for ; Mon, 03 Sep 2012 12:20:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=fjQMIRUplt5ABCLNMGw6G0+t3DtRwBplKpfpDAjC0U4=; b=kUx4GyG7T4fDqJS0S8BpMF3zCrUQPstxMKvJSjaSR7yUjEznc5RaLnIdg+zz+kIF2I Y0i4cxWQhNIHzh8BfR4swCIEXkoNLNl/DGtcK/0antMgOGpmEkzMi9/nwaBB5RNmzDcG TfvjclgTOcKZ9GfVMAMV2bVGOE9MU7xK1PyFC5QnmSrnnHvFfhNy2i1i/txiLjbjjEC8 KoH/r/IRWwrEUc01gAwSA++Lwep9THkfkbPQJGUvNrF50ArJu6v9ZCp2Y26TfXdqou9R v1ylZZ25Y5ehHSLGMVGugA7MYsxrS2wpP1yT6bF5NJKvuO1U3XCbRcOyRUScJBILo9Kl 0UxQ== MIME-Version: 1.0 Received: by 10.50.213.106 with SMTP id nr10mr11515331igc.58.1346700029048; Mon, 03 Sep 2012 12:20:29 -0700 (PDT) Received: by 10.64.53.161 with HTTP; Mon, 3 Sep 2012 12:20:28 -0700 (PDT) In-Reply-To: References: Date: Mon, 3 Sep 2012 15:20:28 -0400 Message-ID: Subject: Re: Key formats and very low cardinality leading fields From: Jean-Marc Spaggiari To: user@hbase.apache.org Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQk6ymobwpsYDzUnki0MARtlnOH/XXDtwOCtCLJOMLfGZKGFK33v/NAu7AQT7SRq5fBsiqOK Yes, you're right, but again, it will depend on the number of regionservers and the distribution of your data. If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. So if you write a million of lines starting with a 1, they will all land on the same server. Of course, you can pre-split your table. Like 1a to 1z and assign each region to one of you 3 servers. That way you will avoir hotspotting even if you write million of lines starting with a 1. If you have une hundred regions, you will face the same issue at the beginning, but the more data your will add, the more your table will be split across all the servers and the less hotspottig you will have. Can't you just revert your fields and put the 1 to 30 at the end of the key? 2012/9/3, Eric Czech : > Thanks for the response Jean-Marc! > > I understand what you're saying but in a more extreme case, let's say > I'm choosing the leading number on the range 1 - 3 instead of 1 - 30. > In that case, it seems like all of the data for any one prefix would > already be split well across the cluster and as long as the second > value isn't written sequentially, there wouldn't be an issue. > > Is my reasoning there flawed at all? > > On Mon, Sep 3, 2012 at 2:31 PM, Jean-Marc Spaggiari > wrote: >> Hi Eric, >> >> In HBase, data is stored sequentially based on the key alphabetical >> order. >> >> It will depend of the number of reqions and regionservers you have but >> if you write data from 23AAAAAA to 23ZZZZZZ they will most probably go >> to the same region even if the cardinality of the 2nd part of the key >> is high. >> >> If the first number is always changing between 1 and 30 for each >> write, then you will reach multiple region/servers if you have, else, >> you might have some hot-stopping. >> >> JM >> >> 2012/9/3, Eric Czech : >>> Hi everyone, >>> >>> I was curious whether or not I should expect any write hot spots if I >>> structured my composite keys in a way such that the first field is a >>> low cardinality (maybe 30 distinct values) value and the next field >>> contains a very high cardinality value that would not be written >>> sequentially. >>> >>> More concisely, I want to do this: >>> >>> Given one number between 1 and 30, write many millions of rows with >>> keys like : >> value> >>> >>> Would there be any problem with the millions of writes happening with >>> the same first field key prefix even if the second field is largely >>> unique? >>> >>> Thank you! >>> >