incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: Query over secondary indexes
Date Tue, 09 Oct 2012 13:56:24 GMT
Oh, and from what I understand, the main differences between CQL and
Scalable SQL(PlayOrm) is

1. Scalable SQL can do left outer joins and inner joins currently
2. Scalable SQL does not need a compound primary key
3. Scalable SQL can partition by 2 and even 3 different columns so it is
sort of having like different dimensions of partitioning

PlayOrm does plan on supporting CQL as well but it is not in yet.

Later,
Dean

On 10/9/12 7:51 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:

>If I understand CQL correctly, behind the scenes in wide rows, there is a
>B-tree.  Even when doing the indexing in CQL, there is a B-tree, so CQL,
>Playorm, they all are really just using a wide row approach basically.  I
>don't think you can avoid that.  Behind the scenes, they are using the
>"compound" column name approach.  CQL partitioning requires compound
>primary key approach as well where PlayOrm does not require this (which
>is why you can partition it two different waysŠ..say you partition trades
>by the accounts and partition it also by securities they are for)
>
>The key is really the partitions as each partition will be backed by that
>wide row or B-tree (whichever you way you prefer to think about it).
>Obviously, you don't want a partitions with billions of rows as the
>B-tree starts to get a bit large.  In both, you can have as many
>partitions as you likeŠbillions, trillions.
>
>PlayOrm is just doing a range scan on your behalf.  If you do a complex
>query like left join trade.account where account.isActive=true and
>trade.numShares>50, it is doing a range scan on a few indices but it does
>so in batches and eventually will do lookahead as well(ie. It will make
>requests for batches before it is done looping over the current batch)
>which can increase performance further in certain scenarios.  It is
>actually quite interesting as it dynamically flips to a hash join when it
>can as well.
>
>Later,
>Dean
>
>
>
>From: Vivek Mishra <mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com>>
>Date: Tuesday, October 9, 2012 7:39 AM
>To: Nrel <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>,
>"user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Subject: Re: Query over secondary indexes
>
>Thanks . This is what i have tried with cqlsh client.
>
>Is there any comparison matrix available with b/w PlayOrm and cqlsh
>command line client? Interesting to look into if it is faster than cql
>client.
>
>I guess problem is with secondary indexing not the volume, because i
>don't want to go for wide row indexing/compount primary key approach.
>
>-Vivek
>
>On Tue, Oct 9, 2012 at 6:20 PM, Hiller, Dean
><Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>> wrote:
>Another option may be PlayOrm for you and it's scalable-SQL.  We queried
>one million rows for 100 results in just 60ms.  (and it does joins).
>Query CL =QUORUM.
>
>Dean
>
>From: Vivek Mishra
><mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com><mailto:mishra.vivs@gm
>ail.com<mailto:mishra.vivs@gmail.com>>>
>Reply-To: 
>"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@c
>assandra.apache.org<mailto:user@cassandra.apache.org>>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@c
>assandra.apache.org<mailto:user@cassandra.apache.org>>>
>Date: Monday, October 8, 2012 7:37 PM
>To: 
>"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@c
>assandra.apache.org<mailto:user@cassandra.apache.org>>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@c
>assandra.apache.org<mailto:user@cassandra.apache.org>>>
>Subject: Re: Query over secondary indexes
>
>I did wait for atleast 5 minutes before terminating it. Also sometimes it
>results in server crash as well, though data volume is not very huge.
>
>-Vivek
>
>On Tue, Oct 9, 2012 at 7:05 AM, Vivek Mishra
><mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com><mailto:mishra.vivs@gm
>ail.com<mailto:mishra.vivs@gmail.com>>> wrote:
>It was on 1 node and there is no error in server logs.
>
>-Vivek
>
>
>On Tue, Oct 9, 2012 at 1:21 AM, aaron morton
><aaron@thelastpickle.com<mailto:aaron@thelastpickle.com><mailto:aaron@thel
>astpickle.com<mailto:aaron@thelastpickle.com>>> wrote:
>get User where user_name = 'Vivek', it is taking ages to retrieve that
>data. Is there anything i am doing wrong?
>How long is ages and how many nodes do you have?
>Are there any errors in server logs ?
>
>When you do a get by secondary index at a CL higher than ONE ever RFth
>node is involved.
>
>Cheers
>
>
>-----------------
>Aaron Morton
>Freelance Developer
>@aaronmorton
>http://www.thelastpickle.com
>
>On 5/10/2012, at 10:20 PM, Vivek Mishra
><mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com><mailto:mishra.vivs@gm
>ail.com<mailto:mishra.vivs@gmail.com>>> wrote:
>
>Thanks Rishabh. But i want to search over duplicate columns only.
>
>-Vivek
>
>On Fri, Oct 5, 2012 at 2:45 PM, Rishabh Agrawal
><rishabh.agrawal@impetus.co.in<mailto:rishabh.agrawal@impetus.co.in><mailt
>o:rishabh.agrawal@impetus.co.in<mailto:rishabh.agrawal@impetus.co.in>>>
>wrote:
>Try making user_name a primary key in combination with some other unique
>column and see if results are improving.
>-Rishabh
>From: Vivek Mishra
>[mailto:mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com><mailto:mishra.
>vivs@gmail.com<mailto:mishra.vivs@gmail.com>>]
>Sent: Friday, October 05, 2012 2:35 PM
>To: 
>user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@ca
>ssandra.apache.org<mailto:user@cassandra.apache.org>>
>Subject: Query over secondary indexes
>
>I have a column family "User" which is having a indexed column
>"user_name". My schema is having around 0.1 million records only and
>user_name is duplicated  across all rows.
>
>Now when i am trying to retrieve it as:
>
>get User where user_name = 'Vivek', it is taking ages to retrieve that
>data. Is there anything i am doing wrong?
>
>Also, i tried get_indexed_slices via Thrift API by setting
>IndexClause.setCount(1), still  no luck, it got hang and not even
>returning a single result. I believe 0.1 million is not a huge amount of
>data.
>
>
>Cassandra version : 1.1.2
>
>Any idea?
>
>
>-Vivek
>
>________________________________
>
>Impetus Ranked in the Top 50 India¹s Best Companies to Work For 2012.
>
>Impetus webcast ŒDesigning a Test Automation Framework for Multi-vendor
>Interoperable Systems¹ available at http://lf1.me/0E/.
>
>
>NOTE: This message may contain information that is confidential,
>proprietary, privileged or otherwise protected by law. The message is
>intended solely for the named addressee. If received in error, please
>destroy and notify the sender. Any use of this email is prohibited when
>received in error. Impetus does not represent, warrant and/or guarantee,
>that the integrity of this communication has been maintained nor that the
>communication is free of errors, virus, interception or interference.
>
>
>
>
>


Mime
View raw message