Hello,
 
I did find these exceptions. I issued the loadbalance command on node 192.168.2.10.
 
INFO [MESSAGING-SERVICE-POOL:3] 2010-03-01 10:34:40,764 TcpConnection.java (line 315) Closing errored connection java.nio.channels.SocketChannel[connected local=/192.168.2.10:55973 remote=/192.168.2.13:7000]
 WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-03-01 10:34:40,964 MessagingService.java (line 555) Running on default stage - beware
 WARN [MESSAGING-SERVICE-POOL:1] 2010-03-01 10:34:40,964 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/192.168.2.10:40758 remote=/192.168.2.13:7000]
 WARN [MESSAGING-SERVICE-POOL:1] 2010-03-01 10:34:40,964 TcpConnection.java (line 485) Exception was generated at : 03/01/2010 10:34:40 on thread MESSAGING-SERVICE-POOL:1
Reached an EOL or something bizzare occured. Reading from: /192.168.2.13 BufferSizeRemaining: 16
java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /192.168.2.13 BufferSizeRemaining: 16
    at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
    at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
    at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
    at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
 INFO [MESSAGING-SERVICE-POOL:1] 2010-03-01 10:34:40,964 TcpConnection.java (line 315) Closing errored connection java.nio.channels.SocketChannel[connected local=/192.168.2.10:40758 remote=/192.168.2.13:7000]
 INFO [MESSAGE-STREAMING-POOL:1] 2010-03-01 10:35:23,171 TcpConnection.java (line 315) Closing errored connection java.nio.channels.SocketChannel[connected local=/192.168.2.10:56728 remote=/192.168.2.13:7000]
 INFO [MESSAGE-STREAMING-POOL:1] 2010-03-01 10:35:23,221 FileStreamTask.java (line 79) Exception was generated at : 03/01/2010 10:35:23 on thread MESSAGE-STREAMING-POOL:1
Value too large for defined data type
java.io.IOException: Value too large for defined data type
    at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
    at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
    at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
    at org.apache.cassandra.net.TcpConnection.stream(TcpConnection.java:226)
    at org.apache.cassandra.net.FileStreamTask.run(FileStreamTask.java:55)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
 
I can certainly upgrade to 0.6 and try a loadbalance there, do you still think it is advisable?
 
All of my key/value entries are well under 1024 bytes but I have millions of them.
 
Do you think I have a data corruption problem?
 
Thanks,
Jon
On Mon, Mar 1, 2010 at 2:54 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
On Mon, Mar 1, 2010 at 3:18 PM, Jon Graham <sjcloud22@gmail.com> wrote:
> Thanks Jonathan.
>
> It seems like the load balance operation isn't moving. I haven't seen any
> data file time changes in 2 hours and no location file time
> changes in over an hour.
>
> I can see a tcp port # 7000 opened on the node where I ran the loadbalance
> command. It is connected to
> port 39033 on the node receiving the data. The CPU usage on both systems is
> very low. There are about 10
> million records on the node where the load balance command was issued.

Did you check logs for exceptions?

> My six node Cassandra ring consists of tokens for nodes 1-6 of:  0
> (ascii 0x30)  6  B  H  O (the letter O)  T
>
> The load balance target node initially had a token of 'H' (using ordered
> partitioning). The source node has a key of 0 (ascii 0x30). Most of the data
> on the source node has keys starting with '/'. Slash falls between tokens T
> and  0 in my ring so most of the data landed on the node with token 0 with
> replicas on the next 2 nodes. My token space is badly divided for the data I
> have already inserted.
>
> Does the initial token value of the load balance target node selected by
> Cassandra need to be cleared or set to a specific value before hand to
> accomodate the load balance data transfer?

No.

> Would I have better luck decommissioning nodes 4,5,6 and trying to
> bootstrapping these nodes one at a time
> with better initial token values?

LoadBalance is basically sugar for decommission + bootstrap, so no.

> I am looking for a good way to move/split/re-balance data from nodes 1,2,3
> to nodes 4, 5, 6 while achiving a better token space distribution.

I would upgrade to the 0.6 beta and try loadbalance again.

-Jonathan