Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 94268 invoked from network); 20 Jan 2010 21:20:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Jan 2010 21:20:57 -0000 Received: (qmail 71575 invoked by uid 500); 20 Jan 2010 21:20:56 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 71547 invoked by uid 500); 20 Jan 2010 21:20:56 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 71538 invoked by uid 99); 20 Jan 2010 21:20:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2010 21:20:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of markxr@gmail.com designates 209.85.218.217 as permitted sender) Received: from [209.85.218.217] (HELO mail-bw0-f217.google.com) (209.85.218.217) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2010 21:20:49 +0000 Received: by bwz9 with SMTP id 9so4463675bwz.12 for ; Wed, 20 Jan 2010 13:20:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=PnwfGlkIoP4NVRS0txacvSe3yLOFKuMnn25PYyAyhqs=; b=dkS6dOrIshk5zDEp/HvPDgQrGN3QfdFFllKXLeK03is1NiUBrJRlpUKqftKTzLx9OJ wI+cX2km6sYk8cXg/v4rL7KKjN/kfoGqW70VkpW0BJDAroYQQZCci89pQPbWUuBZoBnw QMJ62KCzFOTGXpHHt1OBns0hkirIMRrMhf0Ns= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=sYcWsCjT2UT9YCT1syNBoStD+Ja+phnQon6ulsYUGuaNGLZt86XNeoNeBXYkipcEyr jUANMg+PrFIX6LHduVDhJU/QvmmUizgC066BqslXICwjDFp28fuAORfLIgv83iPCvhoV Ym3gaduGpmHCrpHLQeabRgG+pTp5UbfPrlMWk= MIME-Version: 1.0 Received: by 10.204.154.213 with SMTP id p21mr277242bkw.163.1264022428367; Wed, 20 Jan 2010 13:20:28 -0800 (PST) In-Reply-To: <91790a981001201303w2183182ax75afe6612ecc53d1@mail.gmail.com> References: <91790a981001201231o6e346c2ct4fb687c7799fe431@mail.gmail.com> <91790a981001201303w2183182ax75afe6612ecc53d1@mail.gmail.com> Date: Wed, 20 Jan 2010 21:20:28 +0000 Message-ID: Subject: Re: Cassandra to store logs as a list From: Mark Robson To: cassandra-user@incubator.apache.org Content-Type: multipart/alternative; boundary=0015175cd86ed4bbad047d9f27e0 --0015175cd86ed4bbad047d9f27e0 Content-Type: text/plain; charset=ISO-8859-1 I think you really want to be using the OrderPreservingPartitioner and using time-based keys. It depends exactly how you're querying it. All querying use-cases need to be taken into account when deciding how to structure your data. If you use a time-based key with OPP, typically data become very unbalanced, because the balancing algorithm (such as exists) depends on the keys continuing to have a similar distribution as when the nodes were kickstarted. One solution would be to put some other field on the beginning of the key that you might wish to use such as account id, customer id, site id, etc, if you have sufficient of these to spread the data out evenly (do it in hex and zero pad it, of course) Mark --0015175cd86ed4bbad047d9f27e0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I think you really want to be using the OrderPreservingPartitioner and usin= g time-based keys.

It depends exactly how you're querying it. Al= l querying use-cases need to be taken into account when deciding how to str= ucture your data.

If you use a time-based key with OPP, typically data become very unbala= nced, because the balancing algorithm (such as exists) depends on the keys = continuing to have a similar distribution as when the nodes were kickstarte= d.

One solution would be to put some other field on the beginning of the k= ey that you might wish to use such as account id, customer id, site id, etc= , if you have sufficient of these to spread the data out evenly (do it in h= ex and zero pad it, of course)

Mark
--0015175cd86ed4bbad047d9f27e0--