cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mateusz Moneta (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-8824) cassandra python driver return None when querying static column on partition bigger than 5000 entites
Date Wed, 18 Feb 2015 13:21:11 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mateusz Moneta updated CASSANDRA-8824:
--------------------------------------
    Description: 
When we querying partition with static column that has more than 5000 entities some of them
has unset static value, however when querying by cqlsh everything is fine.

Here is example, {{expire}} is a static column, {{folder_id}} is primary key.
{noformat}
cqlsh> select id, parent_id, expire, mtime from share.entity where folder_id='68f2af3a2d1e4f95a231d5cb47e57cf2'
and mtime < '2015-02-01 06:21:25+0000';

 id                               | parent_id | expire                   | mtime
----------------------------------+-----------+--------------------------+--------------------------
 68f2af3a2d1e4f95a231d5cb47e57cf2 |      null | 2015-02-22 10:51:27+0000 | 2015-02-01 06:21:24+0000

cqlsh> select count(*) from share.entity where folder_id='68f2af3a2d1e4f95a231d5cb47e57cf2';
 count
-------
  5547

In [1]: from django.db import connection

In [2]: ses = connection.connection.session

In [3]: from cassandra.query import SimpleStatement

In [13]: query = "select * from share.entity where folder_id='68f2af3a2d1e4f95a231d5cb47e57cf2'";

In [14]: st = SimpleStatement(query)

In [15]: c, d = 0, 0

In [16]: for e in ses.execute(st):
    if e['expire'] is None:
        c += 1
    else:
        d += 1

In [17]: c
Out[17]: 547

In [18]: d
Out[18]: 5000

{noformat}

After further digging its turned out that this is a problem with fetch_size param and this
can be easily reproduced:

{noformat}
In [1]: from cassandra.query import SimpleStatement

In [2]: from django.db import connection

In [3]: ses = connection.connection.session

In [4]: ses.execute(SimpleStatement("create table t (k text, s text static, i int, primary
key(k, i));"))

In [5]: for i in range(1, 500):
   ....:     ses.execute(SimpleStatement("insert into share.t (k, i) values ('k', %d);" %
i))

In [6]: c, d = 0, 0

In [7]: for e in ses.execute(SimpleStatement("select * from share.t", fetch_size=100)):
    if e['s'] is None:
        c += 1
    else:
        d += 1
   ....:         

In [8]: c
Out[8]: 400

In [9]: d
Out[9]: 100

{noformat}

  was:
When we querying partition with static column that has more than 5000 entities some of them
has unset static value, however when querying by cqlsh everything is fine.

Here is example, {{expire}} is a static column, {{folder_id}} is primary key.
{noformat}
cqlsh> select id, parent_id, expire, mtime from share.entity where folder_id='68f2af3a2d1e4f95a231d5cb47e57cf2'
and mtime < '2015-02-01 06:21:25+0000';

 id                               | parent_id | expire                   | mtime
----------------------------------+-----------+--------------------------+--------------------------
 68f2af3a2d1e4f95a231d5cb47e57cf2 |      null | 2015-02-22 10:51:27+0000 | 2015-02-01 06:21:24+0000

cqlsh> select count(*) from share.entity where folder_id='68f2af3a2d1e4f95a231d5cb47e57cf2';
 count
-------
  5547

In [1]: from django.db import connection

In [2]: ses = connection.connection.session

In [3]: from cassandra.query import SimpleStatement

In [13]: query = "select * from share.entity where folder_id='68f2af3a2d1e4f95a231d5cb47e57cf2'";

In [14]: st = SimpleStatement(query)

In [15]: c, d = 0, 0

In [16]: for e in ses.execute(st):
    if e['expire'] is None:
        c += 1
    else:
        d += 1

In [17]: c
Out[17]: 547

In [18]: d
Out[18]: 5000

{noformat}


> cassandra python driver return None when querying static column on partition bigger than
5000 entites
> -----------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8824
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8824
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Mateusz Moneta
>
> When we querying partition with static column that has more than 5000 entities some of
them has unset static value, however when querying by cqlsh everything is fine.
> Here is example, {{expire}} is a static column, {{folder_id}} is primary key.
> {noformat}
> cqlsh> select id, parent_id, expire, mtime from share.entity where folder_id='68f2af3a2d1e4f95a231d5cb47e57cf2'
and mtime < '2015-02-01 06:21:25+0000';
>  id                               | parent_id | expire                   | mtime
> ----------------------------------+-----------+--------------------------+--------------------------
>  68f2af3a2d1e4f95a231d5cb47e57cf2 |      null | 2015-02-22 10:51:27+0000 | 2015-02-01
06:21:24+0000
> cqlsh> select count(*) from share.entity where folder_id='68f2af3a2d1e4f95a231d5cb47e57cf2';
>  count
> -------
>   5547
> In [1]: from django.db import connection
> In [2]: ses = connection.connection.session
> In [3]: from cassandra.query import SimpleStatement
> In [13]: query = "select * from share.entity where folder_id='68f2af3a2d1e4f95a231d5cb47e57cf2'";
> In [14]: st = SimpleStatement(query)
> In [15]: c, d = 0, 0
> In [16]: for e in ses.execute(st):
>     if e['expire'] is None:
>         c += 1
>     else:
>         d += 1
> In [17]: c
> Out[17]: 547
> In [18]: d
> Out[18]: 5000
> {noformat}
> After further digging its turned out that this is a problem with fetch_size param and
this can be easily reproduced:
> {noformat}
> In [1]: from cassandra.query import SimpleStatement
> In [2]: from django.db import connection
> In [3]: ses = connection.connection.session
> In [4]: ses.execute(SimpleStatement("create table t (k text, s text static, i int, primary
key(k, i));"))
> In [5]: for i in range(1, 500):
>    ....:     ses.execute(SimpleStatement("insert into share.t (k, i) values ('k', %d);"
% i))
> In [6]: c, d = 0, 0
> In [7]: for e in ses.execute(SimpleStatement("select * from share.t", fetch_size=100)):
>     if e['s'] is None:
>         c += 1
>     else:
>         d += 1
>    ....:         
> In [8]: c
> Out[8]: 400
> In [9]: d
> Out[9]: 100
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message