cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9304) COPY TO improvements
Date Thu, 03 Sep 2015 02:51:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728422#comment-14728422
] 

Stefania commented on CASSANDRA-9304:
-------------------------------------

The {{RateLimiter}} still seems a bit off. It looked kind of wrong before as you pointed out.
It's not terribly important but I think this line {{self.current_rate = (self.current_rate
+ new_rate) / 2.0}} was meant as an average between the current rate and the new one. So the
first time, when {{current_rate}} is zero, it should not divide by 2 or else we report half
the rate. Secondly,  when we calculate the new rate as {{n / difference}}, we may miss records
because {{n}} is the number of records passed to every call whilst {{difference}} is the time
elapsed since the last time we logged. I wouldn't calculate the rate every time either, but
only when logging it. If {{current_record}} cannot be reset to zero after logging it (maybe
this was the initial intention of the existing code), then we need a new counter which gives
the number of records accumulated between each log point.

It's great we now test for all partitioners but we are only exporting 1 record in {{test_all_datatypes_round_trip}}
so a better candidate would have been {{test_round_trip}}, where at least we export 10K records.
So would you mind adapting {{test_round_trip}} to also run with every partitioner?

In fact it would be good to have a bulk round-trip test as well (only for the default partitioner)
where we export and import 1M records? We would need to use cassandra stress to write the
records. Then we just check the counts. This is just a suggestion.

I had problems when running the cqlsh_tests locally:

{code}
nosetests -s cqlsh_tests
{code}

{code}
======================================================================
ERROR: test_source_glass (cqlsh_tests.cqlsh_tests.TestCqlsh)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/stefania/git/cstar/cassandra-dtest/tools.py", line 252, in wrapped
    f(obj)
  File "/home/stefania/git/cstar/cassandra-dtest/cqlsh_tests/cqlsh_tests.py", line 341, in
test_source_glass
    self.verify_glass(node1)
  File "/home/stefania/git/cstar/cassandra-dtest/cqlsh_tests/cqlsh_tests.py", line 102, in
verify_glass
    'I can eat glass and it does not hurt me': 'Is'
  File "/home/stefania/git/cstar/cassandra-dtest/cqlsh_tests/cqlsh_tests.py", line 95, in
verify_varcharmap
    got = {k.encode("utf-8"): v for k, v in rows[0][0].iteritems()}
IndexError: list index out of range
-------------------- >> begin captured logging << --------------------
dtest: DEBUG: cluster ccm directory: /tmp/dtest-Ldxvcq
--------------------- >> end captured logging << ---------------------

======================================================================
FAIL: test_all_datatypes_read (cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/stefania/git/cstar/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", line 690,
in test_all_datatypes_read
    self.assertCsvResultEqual(self.tempfile.name, results)
  File "/home/stefania/git/cstar/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", line 153,
in assertCsvResultEqual
    raise e
AssertionError: Element counts were not equal:
First has 1, Second has 0:  ['ascii', '1099511627776', '0xbeef', 'True', '3.140000000000000124344978758017532527446746826171875',
'2.444', '1.1', '127.0.0.1', '25', '\xe3\x83\xbd(\xc2\xb4\xe3\x83\xbc\xef\xbd\x80)\xe3\x83\x8e',
'2005-07-14 12:30:00', '2b4e32ce-51de-11e5-85b7-0050b67e8b2f', '830bc4cd-a790-4ac2-85f9-648b0a71306b',
'asdf', '36893488147419103232']
First has 0, Second has 1:  ['ascii', '1099511627776', '0xbeef', 'True', '3.140000000000000124344978758017532527446746826171875',
'2.444', '1.1', '127.0.0.1', '25', '\xe3\x83\xbd(\xc2\xb4\xe3\x83\xbc\xef\xbd\x80)\xe3\x83\x8e',
'2005-07-14 04:30:00', '2b4e32ce-51de-11e5-85b7-0050b67e8b2f', '830bc4cd-a790-4ac2-85f9-648b0a71306b',
'asdf', '36893488147419103232']
-------------------- >> begin captured logging << --------------------
dtest: DEBUG: cluster ccm directory: /tmp/dtest-cSohP9
dtest: DEBUG: Importing from csv file: /tmp/tmpJgdPJc
dtest: WARNING: Mismatch at index: 10
dtest: WARNING: Value in csv: 2005-07-14 12:30:00
dtest: WARNING: Value in result: 2005-07-14 04:30:00
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 69 tests in 1161.775s

FAILED (SKIP=5, errors=1, failures=1)
{code}

I scheduled new CI jobs on my view:

http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9304-testall/
http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9304-dtest/

Let's see if they too report the problems I had locally.

> COPY TO improvements
> --------------------
>
>                 Key: CASSANDRA-9304
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9304
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Kua
>            Priority: Minor
>              Labels: cqlsh
>             Fix For: 2.1.x
>
>
> COPY FROM has gotten a lot of love.  COPY TO not so much.  One obvious improvement could
be to parallelize reading and writing (write one page of data while fetching the next).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message