From user-return-30558-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue Dec 11 20:18:19 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 735A7DC97 for ; Tue, 11 Dec 2012 20:18:19 +0000 (UTC) Received: (qmail 96475 invoked by uid 500); 11 Dec 2012 20:18:16 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 96454 invoked by uid 500); 11 Dec 2012 20:18:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 96444 invoked by uid 99); 11 Dec 2012 20:18:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 20:18:16 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a81.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 20:18:11 +0000 Received: from homiemail-a81.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a81.g.dreamhost.com (Postfix) with ESMTP id C15CAA8076 for ; Tue, 11 Dec 2012 12:17:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=lXYD2a7g61ySX55TqG8LOhzeTg s=; b=SpxhiP2y06biIKfI+A5oFNNwBD4nK7H6/lgnjnDJq0yn2C6anjhF3G8a8m h7mT9oHfflcnZsML1rGgpiTOWQ5+QUusGiT/kFq+JcTtVC80THi9Mv1ece3+WlFt zxvw7kBSa3s3N+jwAsx7epwIzmLX/FVRRLTL8dDFIVSjRPE/k= Received: from [172.16.1.7] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a81.g.dreamhost.com (Postfix) with ESMTPSA id 0F521A806E for ; Tue, 11 Dec 2012 12:17:49 -0800 (PST) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_EC912894-5EC1-40CE-871A-65D3323D80F0" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: multiget_slice SlicePredicate Date: Wed, 12 Dec 2012 09:17:49 +1300 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_EC912894-5EC1-40CE-871A-65D3323D80F0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 I tend to caution against making very large batch mutations or multi = gets, by which I mean 100's of rows at a time.=20 Each row request becomes a task and they can temporarily fill the = mutation or read thread pool. Meaning overall *client* request = throughout drops while a big request is chewed through. =20 This this is more of an issue with smaller clusters. As Dean says, the = client request is performed in parallel on multiple machines.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 12/12/2012, at 3:03 AM, "Hiller, Dean" wrote: > Each node is doing it's thing in parallel=85.they on purpose do NOT = co-ordinate as they do not need to so each one is doing it's scan on the = rows it has individually. >=20 > If all rows "happen" to be on the same server, sure some may be done = sequentially depending on number of rows vs. thread pool size. >=20 > As far as a single row is concerned, I know mutations to a single row = are serialised as Aaron has said as much but you are talking about = multiple rows here. >=20 > Later, > Dean >=20 > From: Wei Zhu > > Reply-To: = "user@cassandra.apache.org" = >, Wei Zhu = > > Date: Monday, December 10, 2012 3:15 PM > To: Cassandr usergroup = > > Subject: Re: multiget_slice SlicePredicate >=20 > Well, not sure how parallel is multiget. Someone is saying it's in = parallel sending requests to the different nodes and on each node it's = executed sequentially. I didn't bother looking into the source code yet. = Anyone knows it for sure? >=20 > I am using Hector, just copied the thrift definition from Cassandra = site for reference. >=20 > You are right, the count is for each individual row. >=20 > Thanks. > -Wei >=20 > ________________________________ > From: "Hiller, Dean" = > > To: "user@cassandra.apache.org" = >; Wei Zhu = > > Sent: Monday, December 10, 2012 1:13 PM > Subject: Re: multiget_slice SlicePredicate >=20 > What's wrong with multiget=85parallel performance is great from = multiple disks and so usually that is a good thing. >=20 > Also, something looks wrong, since you have list keys, I would = expect the Map to be Map> >=20 > Are you sure you have that correct? IF you set range to 100, it = should be 100 columns each row but it never hurts to run the code and = verify. >=20 > Later, > Dean > PlayOrm Developer >=20 >=20 > From: Wei Zhu = >> > Reply-To: = "user@cassandra.apache.org>" = >>, Wei Zhu = >> > Date: Monday, December 10, 2012 2:07 PM > To: Cassandr usergroup = >> > Subject: multiget_slice SlicePredicate >=20 > I know it's probably not a good idea to use multiget, but for my use = case, it's the only choice, >=20 > I have question regarding the SlicePredicate argument of the = multiget_slice >=20 >=20 > The SlicePredicate takes slice_range which takes start, end and range. = I suppose start and end will apply to each individual row. How about = range, is it a accumulative column count of all the rows or to the = individual row? > If I set range to 100, is it 100 columns per row, or total? >=20 > Thanks for you reply, > -Wei >=20 > multiget_slice >=20 > * > map> multiget_slice(list = keys, ColumnParent column_parent, SlicePredicate predicate, = ConsistencyLevel consistency_level) >=20 >=20 >=20 >=20 --Apple-Mail=_EC912894-5EC1-40CE-871A-65D3323D80F0 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 I = tend to caution against making very large batch mutations or multi gets, = by which I mean 100's of rows at a time. 

Each = row request becomes a task and they can temporarily fill the mutation or = read thread pool. Meaning overall *client* request throughout drops = while a big request is chewed through. =  

This this is more of an issue with = smaller clusters. As Dean says, the client request is performed in = parallel on multiple = machines. 

Cheers

http://www.thelastpickle.com

On 12/12/2012, at 3:03 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> = wrote:

Each node is doing it's thing in parallel=85.they on = purpose do NOT co-ordinate as they do not need to so each one is doing = it's scan on the rows it has individually.

If all rows "happen" = to be on the same server, sure some may be done sequentially depending = on number of rows vs. thread pool size.

As far as a single row is = concerned, I know mutations to a single row are serialised as Aaron has = said as much but you are talking about multiple rows = here.

Later,
Dean

From: Wei Zhu <wz1975@yahoo.com<mailto:wz1975@yahoo.com>>
Re= ply-To: "user@cassandra.apache.org<= ;mailto:user@cassandra.apache.org= >" <user@cassandra.apache.org<= ;mailto:user@cassandra.apache.org= >>, Wei Zhu <wz1975@yahoo.com<mailto:wz1975@yahoo.com>>
Da= te: Monday, December 10, 2012 3:15 PM
To: Cassandr usergroup <user@cassandra.apache.org<= ;mailto:user@cassandra.apache.org= >>
Subject: Re: multiget_slice SlicePredicate

Well, = not sure how parallel is multiget. Someone is saying it's in parallel = sending requests to the different nodes and on each node it's executed = sequentially. I didn't bother looking into the source code yet. Anyone = knows it for sure?

I am using Hector, just copied the thrift = definition from Cassandra site for reference.

You are right, the = count is for each individual = row.

Thanks.
-Wei

________________________________
Fro= m: "Hiller, Dean" <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>&g= t;
To: "user@cassandra.apache.org<= ;mailto:user@cassandra.apache.org= >" <user@cassandra.apache.org<= ;mailto:user@cassandra.apache.org= >>; Wei Zhu <wz1975@yahoo.com<mailto:wz1975@yahoo.com>>
Se= nt: Monday, December 10, 2012 1:13 PM
Subject: Re: multiget_slice = SlicePredicate

What's wrong with multiget=85parallel performance = is great from multiple disks and so usually that is a good = thing.

Also, something looks wrong, since you have = list<binary> keys, I would expect the Map to be Map<binary, = list<ColumnOrSuperColumn>>

Are you sure you have that = correct?  IF you set range to 100, it should be 100 columns each = row but it never hurts to run the code and = verify.

Later,
Dean
PlayOrm Developer


From: Wei = Zhu <wz1975@yahoo.com<mailto:wz1975@yahoo.com><mailto= :wz1975@yahoo.com<mailto:wz1975@yahoo.com>>>Reply-To: "user@cassandra.apache.org<= ;mailto:user@cassandra.apache.org= ><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org= >>" <user@cassandra.apache.org<= ;mailto:user@cassandra.apache.org= ><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org= >>>, Wei Zhu <wz1975@yahoo.com<mailto:wz1975@yahoo.com><mailto= :wz1975@yahoo.com<mailto:wz1975@yahoo.com>>>Date: Monday, December 10, 2012 2:07 PM
To: Cassandr usergroup = <user@cassandra.apache.org<= ;mailto:user@cassandra.apache.org= ><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org= >>>
Subject: multiget_slice SlicePredicate

I know = it's probably not a good idea to use multiget, but for my use case, it's = the only choice,

I have question regarding the SlicePredicate = argument of the multiget_slice


The SlicePredicate takes = slice_range which takes start, end and range. I suppose start and end = will apply to each individual row. How about range, is it a accumulative = column count of all the rows or to the individual row?
If I set range = to 100, is it 100 columns per row, or total?

Thanks for you = reply,
-Wei

multiget_slice

*
map<string,list<Col= umnOrSuperColumn>> multiget_slice(list<binary> keys, = ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel = consistency_level)





= = --Apple-Mail=_EC912894-5EC1-40CE-871A-65D3323D80F0--