cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roger Fischer (CW)" <rfis...@Brocade.com>
Subject RE: Order by for aggregated values
Date Tue, 06 Jun 2017 16:07:06 GMT
Hi DuyHai,

thanks for your response.

I understand the reservations about implementing sorting in Cassandra. But I think it is analogous
to filtering. It may be bad in the general case, but can be useful for particular use cases.

If Cassandra does not provide “order-by”, then the ordering has to be done in the client
(or an intermediate tool like Spark). The cost of ordering will be the same, but in the Top
N use case, far more data has to be transferred to the client when the client has to do the
sorting.

So I think, with a qualification “ALLOW ORDERING”, it would be reasonable to support “order
by” on aggregated values.

Thanks…

Roger



From: DuyHai Doan [mailto:doanduyhai@gmail.com]
Sent: Tuesday, June 06, 2017 12:31 AM
To: Roger Fischer (CW) <rfische@Brocade.com>
Cc: user@cassandra.apache.org
Subject: Re: Order by for aggregated values

First Group By is only allowed on partition keys and clustering columns, not on arbitrary
column. The internal implementation of group by tries to fetch data on clustering order to
avoid having to "re-sort" them in memory which would be very expensive

Second, group by works best when restricted to a single partition other wise it will force
Cassandra to do a range scan so poor performance


For all of those reasons I don't expect an "order by" on aggregated values to be available
any soon

Furthermore, Cassandra is optimised for real-time transactional scenarios, the group by/order
by/limit is typically a classical analytics scenario, I would recommend to use the appropriate
tool like Spark for that


Le 6 juin 2017 04:00, "Roger Fischer (CW)" <rfische@brocade.com<mailto:rfische@brocade.com>>
a écrit :
Hello,

is there any intent to support “order by” and “limit” on aggregated values?

For time series data, top n queries are quite common. Group-by was the first step towards
supporting such queries, but ordering by value and limiting the results are also required.

Thanks…

Roger




Mime
View raw message