From user-return-21889-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Sun Oct 30 01:00:53 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D8C7991B3 for ; Sun, 30 Oct 2011 01:00:53 +0000 (UTC) Received: (qmail 7054 invoked by uid 500); 30 Oct 2011 01:00:50 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 7019 invoked by uid 500); 30 Oct 2011 01:00:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 7011 invoked by uid 99); 30 Oct 2011 01:00:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Oct 2011 01:00:50 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of anthony.ikeda.dev@gmail.com designates 209.85.210.46 as permitted sender) Received: from [209.85.210.46] (HELO mail-pz0-f46.google.com) (209.85.210.46) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Oct 2011 01:00:42 +0000 Received: by pzk2 with SMTP id 2so15294436pzk.5 for ; Sat, 29 Oct 2011 18:00:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:references:from:content-type:x-mailer:in-reply-to :message-id:date:to:content-transfer-encoding:mime-version; bh=CCCdQWfphS+sZW0RbKbIX51ThEo9yTju7Regme7Q7CM=; b=W4cCirE0zwzdV5B3+ycs2NXktcnh/ulFxXwnkkoAvTWaQcokzJ4XQn4XO/7H9Wcsvr TQe9l627KShL3hU2hGCEqL06NGhA9/SBrRQL6TgMRIE0DEUISkdSR1PbGmy6KF+F+2Fd eW7SZ3cLdTO4BakCjhVvOHWJ0NGC5e/A9aENA= Received: by 10.68.29.129 with SMTP id k1mr13903468pbh.73.1319936422276; Sat, 29 Oct 2011 18:00:22 -0700 (PDT) Received: from [26.203.203.31] (mea0536d0.tmodns.net. [208.54.5.234]) by mx.google.com with ESMTPS id x8sm36981405pbx.15.2011.10.29.18.00.19 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 29 Oct 2011 18:00:20 -0700 (PDT) Subject: Re: Programmatically allow only one out of two types of rows in a CF to enter the CACHE References: From: Anthony Ikeda Content-Type: multipart/alternative; boundary=Apple-Mail-7A4B2BB9-3148-43FA-BBFC-21B719F65620 X-Mailer: iPhone Mail (9A334) In-Reply-To: Message-Id: Date: Sat, 29 Oct 2011 18:00:10 -0700 To: "user@cassandra.apache.org" Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (1.0) --Apple-Mail-7A4B2BB9-3148-43FA-BBFC-21B719F65620 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii By higher level data I meant the common data. For example we plan on creating an index using Solr for search but as its lu= cene based you can store the common data as part of a Document. It won't be i= ndexed but is still accessible as the document will share the same "id" as t= he row key.=20 Sent from my iPhone On 29/10/2011, at 11:23, Aditya Narayan wrote: > @Mohit: > I have stated the example scenarios in my first post under this heading. > Also I have stated above why I want to split that data in two rows & like I= keda below stated, I'm too trying out to prevent the frequently accessed row= s being bloated with large data & want to prevent that data from entering ca= che as well. >=20 > Okay so as most know this practice is called a wide row - we use them quit= e a lot. However, as your schema shows it will cache (while being active) al= l the row in memory. One way we got around this issue was to basically crea= te some materialized views of any more common data so we can easily get to t= he minimum amount of information required without blowing too much memory wi= th the larger representations. > Yes exactly this is problem I am facing but I want to keep the both the ty= pes(common + large/detailed) of data in single CF so that it could server 't= wo materialized views'. > =20 >=20 > My perspective is that indexing some of the higher levels of data would be= the way to go - Solr or elastic search for distributed or if you know you o= nly need it local just use a caching solution like ehcache > What do you mean exactly by "indexing some of the higher levels of data" ?= >=20 > Thanks you guys! >=20 >=20 > =20 > Anthony >=20 >=20 > On 28/10/2011, at 21:42 PM, Aditya Narayan wrote: >=20 > > I need to keep the data of some entities in a single CF but split in two= rows for each entity. One row contains an overview information for the enti= ty & another row contains detailed information about entity. I am wanting to= keep both rows in single CF so they may be retrieved in a single query when= required together. > > > > Now the problem I am facing is that I want to cache only first type of r= ows(ie, the overview containing rows) & avoid second type rows(that contains= large data) from getting into cache. > > > > Is there a way I can manipulate such filtering of cache entering rows fr= om a single CF? > > > > >=20 >=20 --Apple-Mail-7A4B2BB9-3148-43FA-BBFC-21B719F65620 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=utf-8
By higher level data I meant the common data.

For example we plan on creating an index using Solr for search but as its lucene based you can store the common data as part of a Document. It won't be indexed but is still accessible as the document will share the same "id" as the row key. 

Sent from my iPhone

On 29/10/2011, at 11:23, Aditya Narayan <adynnn@gmail.com> wrote:

@Mohit:
I have stated the example scenarios in my first post under this heading.
Also I have stated above why I want to split that data in two rows & like Ikeda below stated, I'm too trying out to prevent the frequently accessed rows being bloated with large data & want to prevent that data from entering cache as well.

Okay so as most know this practice is called a wide row - we use them quite a lot. However, as your schema shows it will cache (while being active) all the row in memory.  One way we got around this issue was to basically create some materialized views of any more common data so we can easily get to the minimum amount of information required without blowing too much memory with the larger representations.
Yes exactly this is problem I am facing but I want to keep the both the types(common + large/detailed) of data in single CF so that it could server 'two materialized views'.
 

My perspective is that indexing some of the higher levels of data would be the way to go - Solr or elastic search for distributed or if you know you only need it local just use a caching solution like ehcache
What do you mean exactly by  "indexing some of the higher levels of data" ?

Thanks you guys!


 
Anthony


On 28/10/2011, at 21:42 PM, Aditya Narayan wrote:

> I need to keep the data of some entities in a single CF but split in two rows for each entity. One row contains an overview information for the entity & another row contains detailed information about entity. I am wanting to keep both rows in single CF so they may be retrieved in a single query when required together.
>
> Now the problem I am facing is that I want to cache only first type of rows(ie, the overview containing rows) & avoid second type rows(that contains large data) from getting into cache.
>
> Is there a way I can manipulate such filtering of cache entering rows from a single CF?
>
>


--Apple-Mail-7A4B2BB9-3148-43FA-BBFC-21B719F65620--