Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <A683EE2D55D14244B72772B0A53B3A1B012F3A4A4F8E@ihcomm.ImageHawk.local>
References: <u2g3b13e4711004200559pba27df25r46f46a067e38041c@mail.gmail.com>
	 <A683EE2D55D14244B72772B0A53B3A1B012F3A4A4F64@ihcomm.ImageHawk.local>
	 <q2re06563881004200846yf190bf5fj65d90ba4025d83e5@mail.gmail.com>
	 <A683EE2D55D14244B72772B0A53B3A1B012F3A4A4F7D@ihcomm.ImageHawk.local>
	 <w2ye06563881004200916j55efd1c7k1ed033d3a29f2079@mail.gmail.com>
	 <A683EE2D55D14244B72772B0A53B3A1B012F3A4A4F81@ihcomm.ImageHawk.local>
	 <k2ka7fcf8301004201059o4e640e01v4dc88ec66b453265@mail.gmail.com>
	 <A683EE2D55D14244B72772B0A53B3A1B012F3A4A4F8E@ihcomm.ImageHawk.local>
Date: Tue, 20 Apr 2010 18:02:43 -0700
Message-ID: <v2pa7fcf8301004201802oc6ca38d4xeb01489d14d911d4@mail.gmail.com>
Subject: Re: How to increase cassandra's performance in read?
From: Benjamin Black <b@b3k.us>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tue, Apr 20, 2010 at 11:54 AM, Mark Jones <MJones@imagehawk.com> wrote:
> When I look at this arrangement, I see one lookup by key for the user, fo=
llowed by a large read for all the "email indexes" =A0(these are all column=
s in the same row, right?)
>
> Then one lookup by key for each email.... =A0Seems very seek intensive.
>

Do you need to grab every single email every single time?  Seems to me
you only need the recent ones or a page full.  A single multiget would
do it, and the load is spread across the cluster.

>...
>
>
> Ok, so If I do it this way, the # of keys rapidly goes into the billions,=
 does that not cause other problems?

Not generally.  Cassandra is built to handle enormous numbers of rows
efficiently.

>Seems like many more data/index files....
>

Only if you aren't compacting for some reason.


b