Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A20A991ED for ; Fri, 16 Mar 2012 09:04:24 +0000 (UTC) Received: (qmail 18615 invoked by uid 500); 16 Mar 2012 09:04:22 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 18399 invoked by uid 500); 16 Mar 2012 09:04:18 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 18375 invoked by uid 99); 16 Mar 2012 09:04:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Mar 2012 09:04:18 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,TRACKER_ID X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a55.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Mar 2012 09:04:09 +0000 Received: from homiemail-a55.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTP id 616F412C0D3 for ; Fri, 16 Mar 2012 02:03:46 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=GemTSHHjhf hD+WyQkTPWLbh1+kWR32eQ6XyV9F1R959h40V6TXGM3pDhEEyAoALqrqWIkmquNw r0hryxlNZVUt7BcbA38wZrmh0iMsMV4XmVqWCmO7pCP6d1y6wNgFU7slSgodU+7T 5gjBaK5fL13YT4hcthq5IxWTNzHjAVt3E= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=tXlsGUK2B7fHL5oH 01TM3ZXPVRw=; b=bk2RcEHsuOl5BK+Ng7Lm5l+QRKLeI3jxPx7HM7NAdQ7/+y1M seQVC6wTYLm1yypOMdCB72g/HduQMw+Gb37oH9Mh0ZdqPTbOIyjeJmrqfXfUazoJ 3rVvI7JrMxNYjZqviOz4ZIWqoBySdBUrLWYaFfNn6v2Z1iWudN/Gl7wI3nE= Received: from [172.16.1.3] (125-236-193-159.adsl.xtra.co.nz [125.236.193.159]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTPSA id 9D86912C0D8 for ; Fri, 16 Mar 2012 02:03:45 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: multipart/alternative; boundary="Apple-Mail=_7F7211B2-9B1D-4567-A53E-811C1241A689" Subject: Re: new node gets no data Date: Fri, 16 Mar 2012 22:03:42 +1300 In-Reply-To: <4F62DC0D.1070104@rightscale.com> To: user@cassandra.apache.org References: <4F622AC7.6060507@rightscale.com> <5E79F8CF-093D-47AC-AC8C-001B2EA8EBB4@thelastpickle.com> <4F62DC0D.1070104@rightscale.com> Message-Id: X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_7F7211B2-9B1D-4567-A53E-811C1241A689 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 ahh, I think you may have hit a corner case here.=20 If the RF still 1 ?=20 > INFO [AntiEntropySessions:1] 2012-03-16 06:15:13,727 > AntiEntropyService.java (line 663) [repair #%s] No neighbors to repair > with on range %s: session completed Means there are no nodes which share the range with this node. So there = is nothing to repair.=20 To put it another way: As far is 161.101 is concerned none of the keys = it is responsible for are stored on another node. So there are no other = nodes that could be involved in a repair session.=20 It looks like some data may have been written to 161.101 so I think the = safest approach would be: * increase the RF to 2 * repair * decrease the RF to 1 When you added the node was auto_bootstrap enabled ? I would have = thought that would stream data from the first node to the new one.=20 Cheers =20 ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/03/2012, at 7:22 PM, Thorsten von Eicken wrote: > Thanks for the suggestion Aaron, unfortunately, that seems to do > absolutely nothing: >=20 > # nodetool -h localhost repair > INFO [RMI TCP Connection(160)-127.0.0.1] 2012-03-16 06:15:13,718 > StorageService.java (line 1770) Starting repair command #1, repairing = 1 > ranges. > INFO [AntiEntropySessions:1] 2012-03-16 06:15:13,727 > AntiEntropyService.java (line 658) [repair > #6472b290-6f2f-11e1-0000-472739b10cff] new session: will sync > /10.80.161.101 on range (0,85070591730234615865843651857942052864] for > rslog_production.[users, req_text, req_attr_idx, req_word_idx, > req_word_freq, sessions, requests, info] > INFO [AntiEntropySessions:1] 2012-03-16 06:15:13,727 > AntiEntropyService.java (line 663) [repair #%s] No neighbors to repair > with on range %s: session completed > INFO [RMI TCP Connection(160)-127.0.0.1] 2012-03-16 06:15:13,727 > StorageService.java (line 1807) Repair command #1 completed = successfully >=20 > Stumped... > TvE >=20 >=20 > On 3/15/2012 6:41 PM, aaron morton wrote: >> trying running nodetool repair on 10.80.161.101 and then cleanup >> on 10.102.37.168 if everything is ok.=20 >>=20 >> Cheers >>=20 >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >>=20 >> On 16/03/2012, at 6:45 AM, Thorsten von Eicken wrote: >>=20 >>> I added a second node to a single-node ring. RF=3D1. I can't get the = new >>> node to receive any data. Logs look fine. Here's what nodetool = reports: >>>=20 >>> # nodetool -h localhost ring >>> Address DC Rack Status State Load = =20 >>> Owns Token >>>=20 >>> 85070591730234615865843651857942052864 >>> 10.102.37.168 datacenter1 rack1 Up Normal 807.81 GB = =20 >>> 50.00% 0 >>> 10.80.161.101 datacenter1 rack1 Up Normal 1.15 MB = =20 >>> 50.00% 85070591730234615865843651857942052864 >>>=20 >>> Just a "little" imbalance. Yes, I use partitioner: >>> org.apache.cassandra.dht.RandomPartitioner >>> I tried moving the new node's token up/down by 1 and it triggers the = log >>> messages you'd expect, but no data gets transferred. How do I >>> troubleshoot this? Below are the log messages I see when restarting = the >>> new node: >>>=20 >>> INFO [main] 2012-03-15 17:31:08,616 AbstractCassandraDaemon.java = (line >>> 120) JVM vendor/version: >>> Java HotSpot(TM) 64-Bit Server VM/1.6.0_24 >>> INFO [main] 2012-03-15 17:31:14,812 CommitLog.java (line 178) Log >>> replay complete, 8 replayed mutations >>> INFO [main] 2012-03-15 17:31:14,825 StorageService.java (line 390) >>> Cassandra version: 1.0.6 >>> INFO [main] 2012-03-15 17:31:14,825 StorageService.java (line 391) >>> Thrift API version: 19.19.0 >>> INFO [main] 2012-03-15 17:31:14,825 StorageService.java (line 404) >>> Loading persisted ring state >>> INFO [main] 2012-03-15 17:31:14,834 StorageService.java (line 482) >>> Starting up server gossip >>> INFO [main] 2012-03-15 17:31:15,372 MessagingService.java (line 247) >>> Starting Encrypted Messaging Service on SSL port 7000 >>> INFO [main] 2012-03-15 17:31:15,376 MessagingService.java (line 268) >>> Starting Messaging Service on port 7001 >>> INFO [main] 2012-03-15 17:31:15,401 StorageService.java (line 579) >>> Using saved token 85070591730234615865843651857942052864 >>> INFO [main] 2012-03-15 17:31:15,402 ColumnFamilyStore.java (line = 692) >>> Enqueuing flush of Memtable-LocationInfo@645492252(53/66 = serialized/live >>> bytes, 2 ops) >>> INFO [FlushWriter:1] 2012-03-15 17:31:15,403 Memtable.java (line = 240) >>> Writing Memtable-LocationInfo@645492252(53/66 serialized/live bytes, >>> 2 ops) >>> INFO [FlushWriter:1] 2012-03-15 17:31:15,421 Memtable.java (line = 277) >>> Completed flushing /mnt/ebs/data/system/LocationInfo-hc-32-Data.db = (163 >>> bytes) >>> INFO [main] 2012-03-15 17:31:15,424 StorageService.java (line 948) = Node >>> /10.80.161.101 state jump to normal >>> INFO [main] 2012-03-15 17:31:15,434 StorageService.java (line 589) >>> Bootstrap/Replace/Move completed! Now serving reads. >>>=20 >>> # describe keyspace >>> Keyspace: rslog_production: >>> Replication Strategy: org.apache.cassandra.locator.SimpleStrategy >>> Durable Writes: true >>> Options: [replication_factor:1] >>> Column Families: >>>=20 >>=20 --Apple-Mail=_7F7211B2-9B1D-4567-A53E-811C1241A689 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 ahh, = I think you may have hit a corner case here. 

If = the RF still 1 ? 

INFO [AntiEntropySessions:1] 2012-03-16 = 06:15:13,727
AntiEntropyService.java (line 663) [repair #%s] No = neighbors to repair
with on range %s: session = completed
Means there are no nodes which share the = range with this node. So there is nothing to = repair. 

To put it another way: As far = is 161.101 is concerned none of the keys it is responsible for are = stored on another node. So there are no other nodes that could be = involved in a repair session. 

It looks = like some data may have been written to 161.101 so I think the safest = approach would be:
* increase the RF to 2
* = repair
* decrease the RF to 1

When = you added the node was auto_bootstrap enabled ? I would have thought = that would stream data from the first node to the new = one. 

Cheers

 =  
http://www.thelastpickle.com

On 16/03/2012, at 7:22 PM, Thorsten von Eicken = wrote:

Thanks for the suggestion Aaron, unfortunately, that = seems to do
absolutely nothing:

# nodetool -h localhost = repair
 INFO [RMI TCP Connection(160)-127.0.0.1] 2012-03-16 = 06:15:13,718
StorageService.java (line 1770) Starting repair command = #1, repairing 1
ranges.
INFO [AntiEntropySessions:1] 2012-03-16 = 06:15:13,727
AntiEntropyService.java (line 658) = [repair
#6472b290-6f2f-11e1-0000-472739b10cff] new session: will = sync
/10.80.161.101 on range = (0,85070591730234615865843651857942052864] = for
rslog_production.[users, req_text, req_attr_idx, = req_word_idx,
req_word_freq, sessions, requests, info]
INFO = [AntiEntropySessions:1] 2012-03-16 = 06:15:13,727
AntiEntropyService.java (line 663) [repair #%s] No = neighbors to repair
with on range %s: session completed
INFO [RMI = TCP Connection(160)-127.0.0.1] 2012-03-16 = 06:15:13,727
StorageService.java (line 1807) Repair command #1 = completed successfully

Stumped...
=    TvE


On 3/15/2012 6:41 PM, aaron morton = wrote:
trying running nodetool repair on = 10.80.161.101 and then cleanup
on 10.102.37.168 if everything is ok. =

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
=

On 16/03/2012, at 6:45 AM, Thorsten von Eicken = wrote:

I added a second node to a single-node ring. RF=3D1. I = can't get the new
node to receive any data. Logs = look fine. Here's what nodetool = reports:

# nodetool -h localhost = ring
Address         DC =          Rack =        Status State   Load =           
Owns =    Token

85070591730234615865843651857942052864
10.102.37.168 =   datacenter1 rack1       Up =     Normal  807.81 GB =      
50.00% =  0
10.80.161.101   datacenter1 rack1 =       Up     Normal =  1.15 MB =        
50.00% =  85070591730234615865843651857942052864
=

Just a "little" imbalance. Yes, = I use partitioner:
org.apache.cassandra.dht.RandomPartitioner
<= /blockquote>
I tried = moving the new node's token up/down by 1 and it triggers the = log
messages you'd expect, but no data gets transferred. How = do I
troubleshoot this? Below are the log messages I see when = restarting the
new = node:

INFO [main] 2012-03-15 = 17:31:08,616 AbstractCassandraDaemon.java = (line
120) JVM = vendor/version:
Java HotSpot(TM) 64-Bit Server = VM/1.6.0_24
INFO [main] 2012-03-15 = 17:31:14,812 CommitLog.java (line 178) = Log
replay complete, 8 replayed = mutations
INFO [main] 2012-03-15 = 17:31:14,825 StorageService.java (line = 390)
Cassandra version: = 1.0.6
INFO [main] 2012-03-15 17:31:14,825 StorageService.java = (line 391)
Thrift API version: = 19.19.0
INFO [main] 2012-03-15 17:31:14,825 StorageService.java = (line 404)
Loading persisted ring = state
INFO [main] 2012-03-15 17:31:14,834 StorageService.java = (line 482)
Starting up server = gossip
INFO [main] 2012-03-15 17:31:15,372 MessagingService.java = (line 247)
Starting Encrypted Messaging = Service on SSL port 7000
INFO [main] 2012-03-15 = 17:31:15,376 MessagingService.java (line = 268)
Starting Messaging Service on port = 7001
INFO [main] 2012-03-15 17:31:15,401 StorageService.java = (line 579)
Using saved token = 85070591730234615865843651857942052864
INFO [main] 2012-03-15 = 17:31:15,402 ColumnFamilyStore.java (line = 692)
Enqueuing flush of Memtable-LocationInfo@645492252(53/66 = serialized/live
bytes, 2 = ops)
INFO [FlushWriter:1] 2012-03-15 17:31:15,403 Memtable.java = (line 240)
Writing = Memtable-LocationInfo@645492252(53/66 serialized/live = bytes,
2 ops)
INFO [FlushWriter:1] 2012-03-15 = 17:31:15,421 Memtable.java (line = 277)
Completed flushing = /mnt/ebs/data/system/LocationInfo-hc-32-Data.db = (163
bytes)
INFO [main] 2012-03-15 = 17:31:15,424 StorageService.java (line 948) = Node
/10.80.161.101 state jump to = normal
INFO [main] 2012-03-15 17:31:15,434 StorageService.java = (line 589)
Bootstrap/Replace/Move = completed! Now serving reads.

# describe = keyspace
Keyspace: = rslog_production:
Replication Strategy: = org.apache.cassandra.locator.SimpleStrategy
<= blockquote type=3D"cite">
Durable Writes: = true
  Options: = [replication_factor:1]
Column = Families:



<= /html>= --Apple-Mail=_7F7211B2-9B1D-4567-A53E-811C1241A689--