incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rana Aich <aichr...@gmail.com>
Subject Re: Help! Cassandra Data Loader threads are getting stuck
Date Wed, 28 Jul 2010 00:15:58 GMT
Thanks for your offer...there was some problem in reading the *.gz files in
System.in.
I've rectified my code..


On Tue, Jul 27, 2010 at 12:09 AM, Thorvaldsson Justus <
justus.thorvaldsson@svenskaspel.se> wrote:

>  I made one program doing just this with Java
>
> Basically
>
> I read with one thread from file into an array stopping when size is 20k
> and w8 until it is less than 20k and continue reading the datafile. (this is
> the raw data I want to move)
>
>
>
> I have n number of threads
>
> Each with one batch of their own and one connection to Cassandra of their
> own.
>
> They fill their batch with data taking it out of the array (this is
> synchronized), when it reaches 1k it sends it to Cassandra.
>
> I had some problems but none regarding Cassandra it was my own code that
> faltered.
>
> I could provide code if you want.
>
> Justus
>
>
>
> *Från:* Aaron Morton [mailto:aaron@thelastpickle.com]
> *Skickat:* den 26 juli 2010 23:32
> *Till:* user@cassandra.apache.org
> *Ämne:* Re: Help! Cassandra Data Loader threads are getting stuck
>
>
>
> Try running it without threading to see if it's a cassandra problem or an
> issue with your threading.
>
> Perhaps split the file and run many single threaded processes to load the
> data.
>
> Aaron
>
>   On 27 Jul, 2010,at 07:14 AM, Rana Aich <aichrana@gmail.com> wrote:
>
>  Hi All,
>
>
>
> I have to load huge quantity of data into Cassandra (~10Billion rows).
>
>
>
> I'm trying to load the Data from files using multithreading.
>
>
>
> The idea is each thread will read the TAB delimited file and process chunk
> of records.
>
>
>
> For example Thread1 reads line 1-1000 lines
>
> Thread 2 reads line 1001-2000 and insert into Cassandra.
>
> Thread 3 reads line 2001-3000 and insert into Cassandra.
>
>
>
> Thread 10 reads line 9001-10000 and insert into Cassandra.
>
> Thread 1  reads line 10001-11000 and insert into Cassandra.
>
> Thread 2 reads line 11001-12000 and insert into Cassandra.
>
>
>
> and so on...
>
>
>
> I'm testing with a small file size with 200000 records.
>
>
>
> But somehow the process gets stuck and doesn't proceed any further after
> processing say 16,000 records.
>
>
>
> I've attached my working file.
>
>
>
> Any help will be very much appreciated.
>
>
>
> Regards
>
>
>
> raich
>
>

Mime
View raw message