cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2
Date Wed, 07 Aug 2013 21:19:49 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732756#comment-13732756
] 

Brandon Williams commented on CASSANDRA-5789:
---------------------------------------------

I mocked out the test data.  It looks like the main problem here is your script doesn't know
why it can't get the data, it can be any exception.  I made it print them out and got this:

{noformat}
AllServersUnavailable: An attempt was made to connect to each of the servers twice, but none
of the attempts succeeded. The last failure was TTransportException: Could not connect to
cassandra-3:9160
Process CassClientRunProcess-57:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "CassBugRepro.py", line 198, in run
    max_overflow = 1)
  File "/srv/pycassa/pycassa/pool.py", line 383, in __init__
    self.fill()
  File "/srv/pycassa/pycassa/pool.py", line 444, in fill
    conn = self._create_connection()
  File "/srv/pycassa/pycassa/pool.py", line 432, in _create_connection
    (exc.__class__.__name__, exc))
{noformat}

This is obviously bogus as I didn't shut anything down, but certainly doesn't indicate any
missing data.  I'd recommend that you try to reproduce this with cassandra's stress tool and
see if it finds any missing data.
                
> Data not fully replicated with 2 nodes and replication factor 2
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-5789
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5789
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.2, 1.2.6
>         Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL 6.2.  I've
seen the same behavior with Cassandra 1.2.2.
> Sun Java 1.7.0_10-b18 64-bit
> Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M
>            Reporter: James Lee
>            Assignee: Brandon Williams
>         Attachments: CassBugRepro.py
>
>
> I'm seeing a problem with a 2-node Cassandra test deployment, where it seems that data
isn't being replicated among the nodes as I would expect.
> The setup and test is as follows:
> - Two Cassandra nodes in the cluster (they each have themselves and the other node as
seeds in cassandra.yaml).
> - Create 40 keyspaces, each with simple replication strategy and 
> replication factor 2.
> - Populate 125,000 rows into each keyspace, using a pycassa client with a connection
pool pointed at both nodes.  These are populated with writes using consistency level of 1.
> - Wait until nodetool on each node reports that there are no hinted handoffs outstanding
(see output below).
> - Do random reads of the rows in the keyspaces, again using a pycassa client with a connection
pool pointed at both nodes.  These are read using consistency level 1.
> I'm finding that the vast majority of reads are successful, but a small 
> proportion (~0.1%) are returned as Not Found.  If I manually try to look up 
> those keys using cassandra-cli, I see that they are returned when querying one of the
nodes, but not when querying the other.  So it seems like some of the rows have simply not
been replicated, even though the write for these rows was reported to the client as successful.
> If I reduce the rate at which the test tool initially writes data into the database then
I don't see any failed reads, so this seems like a load-related issue.  My understanding is
that if all writes were successful and there are no pending hinted handoffs, then the data
should be fully-replicated and reads should return it (even with read and write consistency
of 1).
> Here's the output from notetool on the two nodes:
> comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All time blocked
> ReadStage                         0         0              2         0              
  0
> RequestResponseStage              0         0         878494         0              
  0
> MutationStage                     0         0        2869107         0              
  0
> ReadRepairStage                   0         0              0         0              
  0
> ReplicateOnWriteStage             0         0              0         0              
  0
> GossipStage                       0         0           2208         0              
  0
> AntiEntropyStage                  0         0              0         0              
  0
> MigrationStage                    0         0            994         0              
  0
> MemtablePostFlusher               0         0           4399         0              
  0
> FlushWriter                       0         0           2264         0              
556
> MiscStage                         0         0              0         0              
  0
> commitlog_archiver                0         0              0         0              
  0
> InternalResponseStage             0         0            153         0              
  0
> HintedHandoff                     0         0              2         0              
  0
> Message type           Dropped
> RANGE_SLICE                  0
> READ_REPAIR                  0
> BINARY                       0
> READ                         0
> MUTATION                 87655
> _TRACE                       0
> REQUEST_RESPONSE             0
> comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All time blocked
> ReadStage                         0         0            868         0              
  0
> RequestResponseStage              0         0        3919665         0              
  0
> MutationStage                     0         0        8177325         0              
  0
> ReadRepairStage                   0         0            113         0              
  0
> ReplicateOnWriteStage             0         0              0         0              
  0
> GossipStage                       0         0           9624         0              
  0
> AntiEntropyStage                  0         0              0         0              
  0
> MigrationStage                    0         0           2666         0              
  0
> MemtablePostFlusher               0         0           7869         0              
  0
> FlushWriter                       0         0           4273         0              1179
> MiscStage                         0         0              0         0              
  0
> commitlog_archiver                0         0              0         0              
  0
> InternalResponseStage             0         0            215         0              
  0
> HintedHandoff                     0         0              8         0              
  0
> Message type           Dropped
> RANGE_SLICE                  0
> READ_REPAIR                  0
> BINARY                       0
> READ                         0
> MUTATION                531988
> _TRACE                       0
> REQUEST_RESPONSE             0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message