Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 2055 invoked from network); 21 Oct 2010 17:59:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Oct 2010 17:59:37 -0000 Received: (qmail 98035 invoked by uid 500); 21 Oct 2010 17:59:35 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 97964 invoked by uid 500); 21 Oct 2010 17:59:35 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 97956 invoked by uid 99); 21 Oct 2010 17:59:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Oct 2010 17:59:35 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a43.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Oct 2010 17:59:28 +0000 Received: from homiemail-a43.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a43.g.dreamhost.com (Postfix) with ESMTP id 7DB208C05F for ; Thu, 21 Oct 2010 10:59:05 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=aZ67JhcP8y Sjy7IJZY+anACFgqVlUieHj77tEpsTaVqoOHYmKlM461/ERM5uQ9xSIcLRk4WhAq nCiVSoZ4yZ2mAwVdbSovS92cpezcTJkYWofWt+dNW/S0QnmeI4cL3D6mmMqY1zkq 8Jw3bzHb0/nqxRUVwjN5qEqagkD9njJNc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=cVj51exD4/u3g/f4 xHyukqv6Ogc=; b=nvJE09nO45hqoX0Q8YVO6sJFMLKYTaDfi6qBQAv+LKrpheeH vw6hV4PP6nTT94WpvM3l835bXt9hyYxhjcFmbX96A9M1hbn5wFPEUsqJncfYW8dD wr34M6Ay+s1azdZCYNPp0ZvgXqltNA7M8xyKSvF68qzVnWS81fnvdZpc7ws= Received: from [10.0.1.155] (121-73-157-230.cable.telstraclear.net [121.73.157.230]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a43.g.dreamhost.com (Postfix) with ESMTPSA id 0E6ED8C05D for ; Thu, 21 Oct 2010 10:59:04 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: multipart/alternative; boundary=Apple-Mail-2-139096920 Subject: Re: creating and dropping columnfamilies as a usecase Date: Fri, 22 Oct 2010 06:59:02 +1300 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: <1C02E04A-963B-471D-9830-97215B240FAE@thelastpickle.com> X-Mailer: Apple Mail (2.1081) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-2-139096920 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 AFAIK it's not really the purpose of the dynamic schema functions.=20 You may run into problems such as the caches are per CF and the CF's = have a high memory overhead (3 * mem table MB) so your memory usage will = jump around. Cloud Kick gather a lot of metrics this may help = http://wiki.apache.org/cassandra/ArchitectureCommitLog If you want to use Hadoop for the analysis, and the data really can be = thrown away, then I would consider using Hadoop by it's self. Take a = look at Flume from cloudera to stream data into HDFS = http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-s= erver-logs/ Hope that helps.=20 Aaron On 22 Oct 2010, at 04:12, Utku Can Top=E7u wrote: > Hi All, >=20 > In the current project I'm working on. I have use case for hourly = analyzing the rows. >=20 > Since the 0.7x branch supports creating and dropping columnfamilies on = the fly;=20 > My use case proposal will be: >=20 > * Create a CF at the very beginning of every hour > * At the end of the 1-hour period, analyze the data stored in the CF = with Hadoop > * Drop the CF afterwards. >=20 > Can you foresee any problems in continiously creating and dropping = columnfamilies? >=20 > Regards, > Utku >=20 >=20 --Apple-Mail-2-139096920 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 AFAIK = it's not really the purpose of the dynamic schema = functions. 

You may run into problems such as the caches are = per CF and the CF's have a high memory overhead (3 * mem table MB) so = your memory usage will jump around.

Cloud Kick gather a lot of = metrics this may help http://wik= i.apache.org/cassandra/ArchitectureCommitLog

If you want to = use Hadoop for the analysis, and the data really can be thrown away, = then I would consider using Hadoop by it's self. Take a look at Flume = from cloudera to stream data into HDFS http://www.cloudera.com/blog/2010/09/using-flume-to-c= ollect-apache-2-web-server-logs/

Hope that = helps. 
Aaron

On 22 Oct 2010, at 04:12, Utku = Can Top=E7u wrote:

Hi = All,

In the current project I'm working on. I have use case for = hourly analyzing the rows.

Since the 0.7x branch supports = creating and dropping columnfamilies on the fly;
My use case = proposal will be:

* Create a CF at the very beginning of every hour
* At the end of = the 1-hour period, analyze the data stored in the CF with Hadoop
* = Drop the CF afterwards.

Can you foresee any problems in = continiously creating and dropping columnfamilies?

Regards,
Utku



= --Apple-Mail-2-139096920--