Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 19418 invoked from network); 30 Apr 2010 14:10:07 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Apr 2010 14:10:07 -0000 Received: (qmail 20283 invoked by uid 500); 30 Apr 2010 14:10:06 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 20263 invoked by uid 500); 30 Apr 2010 14:10:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 20255 invoked by uid 99); 30 Apr 2010 14:10:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Apr 2010 14:10:06 +0000 X-ASF-Spam-Status: No, hits=2.5 required=10.0 tests=AWL,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 74.125.82.172 is neither permitted nor denied by domain of utku@topcu.gen.tr) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Apr 2010 14:10:00 +0000 Received: by mail-wy0-f172.google.com with SMTP id 35so189467wyb.31 for ; Fri, 30 Apr 2010 07:09:40 -0700 (PDT) Received: by 10.216.86.209 with SMTP id w59mr2373411wee.186.1272636578456; Fri, 30 Apr 2010 07:09:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.186.78 with HTTP; Fri, 30 Apr 2010 07:09:18 -0700 (PDT) In-Reply-To: References: From: =?UTF-8?Q?Utku_Can_Top=C3=A7u?= Date: Fri, 30 Apr 2010 16:09:18 +0200 Message-ID: Subject: Re: ColumnFamilyInputFormat KeyRange scans on a CF To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6d644c9303fa3048574cbf5 --0016e6d644c9303fa3048574cbf5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I meant in the first sentence "running the get_range_slices from a single point" On Fri, Apr 30, 2010 at 4:08 PM, Utku Can Top=C3=A7u wr= ote: > Do you mean, running the get_range_slices from a single? Yes, it would be > reasonable for a relatively small key range, when it comes to analyze a > really big range in really big data collection (i.e. like the one we > currently populate) having a way for distributing the reads among the > cluster seems the only reasonable solution. > > In this current situation, the best option might be distributing the rang= e > among ColumnFamilies (say, 1 CF for each day) and emptying the CF in orde= r > to reuse for another day range after analyzing the data. > > Can you suggest a workaround for this? > --0016e6d644c9303fa3048574cbf5 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I meant in the first sentence "running the get_range_slices from a sin= gle point"

On Fri, Apr 30, 2010 at 4= :08 PM, Utku Can Top=C3=A7u <utku@topcu.gen.tr> wrote:

Do you mean, runn= ing the get_range_slices from a single? Yes, it would be reasonable for a r= elatively small key range, when it comes to analyze a really big range in r= eally big data collection (i.e. like the one we currently populate) having = a way for distributing the reads among the cluster seems the only reasonabl= e solution.

In this current situation, the best option might be distributing the ra= nge among ColumnFamilies (say, 1 CF for each day) and emptying the CF in or= der to reuse for another day range after analyzing the data.

Can you= suggest a workaround for this?

--0016e6d644c9303fa3048574cbf5--