cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Svihla <rsvi...@datastax.com>
Subject Re: Store counter with non-counter column in the same column family?
Date Tue, 23 Dec 2014 03:31:07 GMT
increment wouldn't be idempotent from the client unless you knew the count
at the time of the update (which you could do with LWT but that has pretty
harsh performance), that particular jira is about how they're laid out and
avoiding race conditions between nodes, which was resolved in 2.1 beta 1
(which is now in officially out in the 2.1.x branch)

General improvements on counters in 2.1 are laid out here
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

As for best practice the answer is multiple tables for multiple query
paths, or you can use something like solr or spark, take a look at the
spark cassandra connector for a good way to count on lots of data from lots
of different query paths
https://github.com/datastax/spark-cassandra-connector.



On Mon, Dec 22, 2014 at 9:22 PM, ziju feng <pkdogcom@gmail.com> wrote:

> I just skimmed through JIRA
> <https://issues.apache.org/jira/browse/CASSANDRA-4775> and it seems there
> has been some effort to make update idempotent. Perhaps the problem can be
> fixed in the near future?
>
> Anyway, what is the current best practice for such use case? (Counting and
> displaying counts in different queries) I don't need a 100% accurate count
> and strong consistency. Performance and application complexity is my main
> concern.
>
> Thanks
>
> On Mon, Dec 22, 2014 at 10:37 PM, Ryan Svihla <rsvihla@datastax.com>
> wrote:
>
>> You can cheat it by using the non counter column as part of your primary
>> key (clustering column specifically) but the cases where this could work
>> are limited and the places this is a good idea are even more rare.
>>
>> As for using counters in batches are already a not well regarded concept
>> and counter batches have a number of troubling behaviors, as already stated
>> increments aren't idempotent and batch implies retry.
>>
>> As for DSE search its doing something drastically different internally
>> and the type of counting its doing is many orders of magnitude faster (
>> think bitmask style matching + proper async 2i to minimize fanout cost)
>>
>> Generally speaking counting accurately while being highly available
>> creates an interesting set of logical tradeoffs. Example what do you do if
>> you're not able to communicate between two data centers, but both are up
>> and serving "likes" quite happily? Is your counting down? Do you keep
>> counting but serve up different answers? More accurately since problems are
>> rarely data center to data center but more frequently between replicas, how
>> much availability are you willing to give up in exchange for a globally
>> accurate count?
>> On Dec 22, 2014 6:00 AM, "DuyHai Doan" <doanduyhai@gmail.com> wrote:
>>
>>> It's not possible to mix counter and non counter columns because
>>> currently the semantic of counter is only increment/decrement (thus NOT
>>> idempotent) and requires some special handling compared to other C* columns.
>>>
>>> On Mon, Dec 22, 2014 at 11:33 AM, ziju feng <pkdogcom@gmail.com> wrote:
>>>
>>>> ​I was wondering if there is plan to allow ​creating counter column and
>>>> standard column in the same table.
>>>>
>>>> Here is my use case:
>>>> I want to use counter to count how many users like a given item in my
>>>> application. The like count needs to be returned along with details of item
>>>> in query. To support querying items in different ways, I use both
>>>> application-maintained denormalized index tables and DSE search for
>>>> indexing. (DSE search is also used for text searching)
>>>>
>>>> Since current counter implementation doesn't allow having counter
>>>> columns and non-counter columns in the same table, I have to propagate the
>>>> current count from counter table to the main item table and index tables,
>>>> so that like counts can be returned by those index tables without sending
>>>> extra requests to counter table and DSE search is able to build index on
>>>> like count column in the main item table to support like count related
>>>> queries (such as sorting by like count).
>>>>
>>>> IMHO, the only way to sync data between counter table and normal table
>>>> within a reasonable time (sub-seconds) currently is to read the current
>>>> value from counter table right after the update. However it suffers from
>>>> several issues:
>>>> 1. Read-after-write may not return the correct count when replication
>>>> factor > 1 unless consistency level ALL/LOCAL_ALL is used
>>>> 2. There are two extra non-parallelizable round-trips between the
>>>> application server and cassandra, which can have great impact on
>>>> performance.
>>>>
>>>> If it is possible to store counter in standard column family, only one
>>>> write will be needed to update like count in the main table. Counter value
>>>> will also be eventually synced between replicas so that there is no need
>>>> for application to use extra mechanism like scheduled task to get the
>>>> correct counts.
>>>>
>>>> A related issue is lifting the limitation of not allowing updating
>>>> counter columns and normal columns in one batch, since it is quite common
>>>> to not only have a counter for statistics but also store the details, such
>>>> as storing the relation of which user likes which items in my user case.
>>>>
>>>> Any idea?
>>>>
>>>>
>>>
>


-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Mime
View raw message