incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: Query over secondary indexes
Date Tue, 09 Oct 2012 13:51:06 GMT
If I understand CQL correctly, behind the scenes in wide rows, there is a B-tree.  Even when
doing the indexing in CQL, there is a B-tree, so CQL, Playorm, they all are really just using
a wide row approach basically.  I don't think you can avoid that.  Behind the scenes, they
are using the "compound" column name approach.  CQL partitioning requires compound primary
key approach as well where PlayOrm does not require this (which is why you can partition it
two different ways…..say you partition trades by the accounts and partition it also by securities
they are for)

The key is really the partitions as each partition will be backed by that wide row or B-tree
(whichever you way you prefer to think about it).  Obviously, you don't want a partitions
with billions of rows as the B-tree starts to get a bit large.  In both, you can have as many
partitions as you like…billions, trillions.

PlayOrm is just doing a range scan on your behalf.  If you do a complex query like left join
trade.account where account.isActive=true and trade.numShares>50, it is doing a range scan
on a few indices but it does so in batches and eventually will do lookahead as well(ie. It
will make requests for batches before it is done looping over the current batch) which can
increase performance further in certain scenarios.  It is actually quite interesting as it
dynamically flips to a hash join when it can as well.

Later,
Dean



From: Vivek Mishra <mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com>>
Date: Tuesday, October 9, 2012 7:39 AM
To: Nrel <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>, "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Query over secondary indexes

Thanks . This is what i have tried with cqlsh client.

Is there any comparison matrix available with b/w PlayOrm and cqlsh command line client? Interesting
to look into if it is faster than cql client.

I guess problem is with secondary indexing not the volume, because i don't want to go for
wide row indexing/compount primary key approach.

-Vivek

On Tue, Oct 9, 2012 at 6:20 PM, Hiller, Dean <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>
wrote:
Another option may be PlayOrm for you and it's scalable-SQL.  We queried one million rows
for 100 results in just 60ms.  (and it does joins).  Query CL =QUORUM.

Dean

From: Vivek Mishra <mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com><mailto:mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com>>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Date: Monday, October 8, 2012 7:37 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Subject: Re: Query over secondary indexes

I did wait for atleast 5 minutes before terminating it. Also sometimes it results in server
crash as well, though data volume is not very huge.

-Vivek

On Tue, Oct 9, 2012 at 7:05 AM, Vivek Mishra <mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com><mailto:mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com>>>
wrote:
It was on 1 node and there is no error in server logs.

-Vivek


On Tue, Oct 9, 2012 at 1:21 AM, aaron morton <aaron@thelastpickle.com<mailto:aaron@thelastpickle.com><mailto:aaron@thelastpickle.com<mailto:aaron@thelastpickle.com>>>
wrote:
get User where user_name = 'Vivek', it is taking ages to retrieve that data. Is there anything
i am doing wrong?
How long is ages and how many nodes do you have?
Are there any errors in server logs ?

When you do a get by secondary index at a CL higher than ONE ever RFth node is involved.

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/10/2012, at 10:20 PM, Vivek Mishra <mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com><mailto:mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com>>>
wrote:

Thanks Rishabh. But i want to search over duplicate columns only.

-Vivek

On Fri, Oct 5, 2012 at 2:45 PM, Rishabh Agrawal <rishabh.agrawal@impetus.co.in<mailto:rishabh.agrawal@impetus.co.in><mailto:rishabh.agrawal@impetus.co.in<mailto:rishabh.agrawal@impetus.co.in>>>
wrote:
Try making user_name a primary key in combination with some other unique column and see if
results are improving.
-Rishabh
From: Vivek Mishra [mailto:mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com><mailto:mishra.vivs@gmail.com<mailto:mishra.vivs@gmail.com>>]
Sent: Friday, October 05, 2012 2:35 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Query over secondary indexes

I have a column family "User" which is having a indexed column "user_name". My schema is having
around 0.1 million records only and user_name is duplicated  across all rows.

Now when i am trying to retrieve it as:

get User where user_name = 'Vivek', it is taking ages to retrieve that data. Is there anything
i am doing wrong?

Also, i tried get_indexed_slices via Thrift API by setting  IndexClause.setCount(1), still
 no luck, it got hang and not even returning a single result. I believe 0.1 million is not
a huge amount of data.


Cassandra version : 1.1.2

Any idea?


-Vivek

________________________________

Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.

Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor Interoperable Systems’
available at http://lf1.me/0E/.


NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.






Mime
View raw message