From user-return-29361-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Oct 8 10:03:29 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2E5A6D520 for ; Mon, 8 Oct 2012 10:03:29 +0000 (UTC) Received: (qmail 68460 invoked by uid 500); 8 Oct 2012 10:03:26 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 68204 invoked by uid 500); 8 Oct 2012 10:03:26 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 68180 invoked by uid 99); 8 Oct 2012 10:03:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Oct 2012 10:03:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of discipe@gmail.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-la0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Oct 2012 10:03:17 +0000 Received: by mail-la0-f44.google.com with SMTP id b11so2345142lam.31 for ; Mon, 08 Oct 2012 03:02:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:user-agent:x-identity-key :content-type:fcc:message-id:content-transfer-encoding:x-account-key :x-mailer:from:subject:x-mozilla-draft-info:date:to; bh=JIQ6nLNTGWjBBjtRvoYv8nrKLUNYabvjIokc99WIO3o=; b=nBeQ+Fr7Oy9IP/aHvFZM1dpwa/AhKSjNKIyPocFlr2cHeja+UYHMCGF50X/trlrELe 4vwUPXUKOtrzLQmDCIx0TtYmdTf8iNHJ0fmt7DoU2Keb8zzOsiWiN+2GmGrZgoQt8QzF R6+Ba+xWq7cqVBJ9eh46pJ9Ydt9x4If5fDQM6uIqMRc6A2LBt0rnqoRRFWQXXTONcZkZ 8tKhxZDy2ABVgxfJ0ramuzd644m6rNyYZzbqQ5d2tsI82ldimzjxOb7wNGpAR3yLx+HK worTvXCd85H/LOWRduUzwJLgfCM0Uuw3+6AZDTS/yjT4op9OaC5hvlm4bbh35XDRoOq4 VTSw== Received: by 10.152.108.37 with SMTP id hh5mr12981580lab.52.1349690575783; Mon, 08 Oct 2012 03:02:55 -0700 (PDT) Received: from [172.24.172.93] ([83.149.8.132]) by mx.google.com with ESMTPS id hu6sm5284352lab.13.2012.10.08.03.02.52 (version=SSLv3 cipher=OTHER); Mon, 08 Oct 2012 03:02:54 -0700 (PDT) Mime-Version: 1.0 References: In-Reply-To: User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 X-Identity-Key: id2 Content-Type: multipart/alternative; boundary=Apple-Mail-459E8DD5-4601-451C-8093-B571854E84E1 Fcc: imap://discipe%40gmail.com@imap.googlemail.com/[Gmail]/Sent Mail Message-Id: Content-Transfer-Encoding: 7bit X-Account-Key: account3 X-Mailer: iPhone Mail (10A403) From: Vanger Subject: Re: 1000's of CF's. X-Mozilla-Draft-Info: internal/draft; vcard=0; receipt=0; DSN=0; uuencode=0 Date: Mon, 8 Oct 2012 14:02:51 +0400 To: "user@cassandra.apache.org" X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-459E8DD5-4601-451C-8093-B571854E84E1 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable So what solution should be for cassandra architecture when we need to make H= adoop M\R jobs and not be restricted by number of CF? What we have now is fair amount of CFs (> 2K) and this number is slowly gro= wing so we already planing to merge partitioned CFs. But our next goal is to= run hadoop tasks on those CFs. All we have is plain Hector and custom ORM o= n top of it. As far as i understand VirtualKeyspace doesn't help in our case= .=20 Also i dont understand why not implement support for many CF ( or build-in p= artitioning ) on cassandra side. Anybody can explain why this can or cannot b= e done in cassandra? Just in case: We're using cassandra 1.0.11 on 30 nodes (planning upgrade on 1.1.* soon). -- W/ best regards,=20 Sergey. On 04.10.2012 0:10, Hiller, Dean wrote: > Okay, so it only took me two solid days not a week. PlayOrm in master bra= nch now supports virtual CF's or virtual tables in ONE CF, so you can have 1= 000's or millions of virtual CF's in one CF now. It works with all the Scal= able-SQL, works with the joins, and works with the PlayOrm command line tool= . >=20 > Two ways to do it, if you are using the ORM half, you just annotate >=20 > @NoSqlEntity("MyVirtualCfName") > @NoSqlVirtualCf(storedInCf=3D"sharedCf") >=20 > So it's stored in sharedCf with the table name of MyVirtualCfName(in comma= nd line tool, use MyVirtualCfName to query the table). >=20 > Then if you don't know your meta data ahead of time, you need to create Db= oTableMeta and DboColumnMeta objects and save them for every table you creat= e and can use TypedRow to read and persist (which is what we have a project d= oing). >=20 > If you try it out let me know. We usually get bug fixes in pretty fast if= you run into anything. (more and more questions are forming on stack overf= low as well ;) ). >=20 > Later, > Dean >=20 >=20 --Apple-Mail-459E8DD5-4601-451C-8093-B571854E84E1 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit
So what solution should be for cassandra architecture when we need to make Hadoop M\R jobs and not be restricted by number of CF?
What we have now is fair amount of CFs  (> 2K) and this number is slowly growing so we already planing to merge partitioned CFs. But our next goal is to run hadoop tasks on those CFs. All we have is plain Hector and custom ORM on top of it. As far as i understand VirtualKeyspace doesn't help in our case. 
Also i dont understand why not implement support for many CF ( or build-in  partitioning ) on cassandra side. Anybody can explain why this can or cannot be done in cassandra?

Just in case:
We're using cassandra 1.0.11 on 30 nodes (planning upgrade on 1.1.* soon).

--
W/ best regards,
Sergey.


On 04.10.2012 0:10, Hiller, Dean wrote:
Okay, so it only took me two solid days not a week.  PlayOrm in master branch now supports virtual CF's or virtual tables in ONE CF, so you can have 1000's or millions of virtual CF's in one CF now.  It works with all the Scalable-SQL, works with the joins, and works with the PlayOrm command line tool.

Two ways to do it, if you are using the ORM half, you just annotate

@NoSqlEntity("MyVirtualCfName")
@NoSqlVirtualCf(storedInCf="sharedCf")

So it's stored in sharedCf with the table name of MyVirtualCfName(in command line tool, use MyVirtualCfName to query the table).

Then if you don't know your meta data ahead of time, you need to create DboTableMeta and DboColumnMeta objects and save them for every table you create and can use TypedRow to read and persist (which is what we have a project doing).

If you try it out let me know.  We usually get bug fixes in pretty fast if you run into anything.  (more and more questions are forming on stack overflow as well ;) ).

Later,
Dean




--Apple-Mail-459E8DD5-4601-451C-8093-B571854E84E1--