cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammed Guller <moham...@glassbeam.com>
Subject RE: Retrieving all row keys of a CF
Date Fri, 23 Jan 2015 00:42:52 GMT
What is the average and max # of CQL rows in each partition? Is 800,000 the number of CQL rows
or Cassandra partitions (storage engine rows)?

Another option you could try is a CQL statement to fetch all partition keys. You could first
try this in the cqlsh:

“SELECT DISTINCT pk1, pk2…pkn FROM CF”

You will need to specify all the composite columns if you are using a composite partition
key.

Mohammed

From: Ravi Agrawal [mailto:ragrawal@clearpoolgroup.com]
Sent: Thursday, January 22, 2015 1:57 PM
To: user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF

Hi,
I increased range timeout, read timeout to first to 50 secs then 500 secs and Astyanax client
to 60, 550 secs respectively. I still get timeout exception.
I see the logic with .withCheckpointManager() code, is that the only way it could work?


From: Eric Stevens [mailto:mightye@gmail.com]
Sent: Saturday, January 17, 2015 9:55 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Retrieving all row keys of a CF

If you're getting partial data back, then failing eventually, try setting .withCheckpointManager()
- this will let you keep track of the token ranges you've successfully processed, and not
attempt to reprocess them.  This will also let you set up tasks on bigger data sets that take
hours or days to run, and reasonably safely interrupt it at any time without losing progress.

This is some *very* old code, but I dug this out of a git history.  We don't use Astyanax
any longer, but maybe an example implementation will help you.  This is Scala instead of Java,
but hopefully you can get the gist.

https://gist.github.com/MightyE/83a79b74f3a69cfa3c4e

If you're timing out talking to your cluster, then I don't recommend using the cluster to
track your checkpoints, but some other data store (maybe just a flatfile).  Again, this is
just to give you a sense of what's involved.

On Fri, Jan 16, 2015 at 6:31 PM, Mohammed Guller <mohammed@glassbeam.com<mailto:mohammed@glassbeam.com>>
wrote:
Both total system memory and heap size can’t be 8GB?

The timeout on the Astyanax client should be greater than the timeouts on the C* nodes, otherwise
your client will timeout prematurely.

Also, have you tried increasing the timeout for the range queries to a higher number? It is
not recommended to set them very high, because a lot of other problems may start happening,
but then reading 800,000 partitions is not a normal operation.

Just as an experimentation, can you set the range timeout to 45 seconds on each node and the
timeout on the Astyanax client to 50 seconds? Restart the nodes after increasing the timeout
and try again.

Mohammed

From: Ravi Agrawal [mailto:ragrawal@clearpoolgroup.com<mailto:ragrawal@clearpoolgroup.com>]
Sent: Friday, January 16, 2015 5:11 PM

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Retrieving all row keys of a CF


1)            What is the heap size and total memory on each node? 8GB, 8GB
2)            How big is the cluster? 4
3)            What are the read and range timeouts (in cassandra.yaml) on the C* nodes? 10
secs, 10 secs
4)            What are the timeouts for the Astyanax client? 2 secs
5)            Do you see GC pressure on the C* nodes? How long does GC for new gen and old
gen take? occurs every 5 secs dont see huge gc pressure, <50ms
6)            Does any node crash with OOM error when you try AllRowsReader? No

From: Mohammed Guller [mailto:mohammed@glassbeam.com]
Sent: Friday, January 16, 2015 7:30 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Retrieving all row keys of a CF

A few questions:


1)      What is the heap size and total memory on each node?

2)      How big is the cluster?

3)      What are the read and range timeouts (in cassandra.yaml) on the C* nodes?

4)      What are the timeouts for the Astyanax client?

5)      Do you see GC pressure on the C* nodes? How long does GC for new gen and old gen take?

6)      Does any node crash with OOM error when you try AllRowsReader?

Mohammed

From: Ravi Agrawal [mailto:ragrawal@clearpoolgroup.com]
Sent: Friday, January 16, 2015 4:14 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Retrieving all row keys of a CF

Hi,
I and Ruchir tried query using AllRowsReader recipe but had no luck. We are seeing PoolTimeoutException.
SEVERE: [Thread_1] Error reading RowKeys
com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException:
[host=servername, latency=2003(2003), attempts=4]Timed out waiting for connection
       at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231)
       at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:198)
       at com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:84)
       at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:117)
       at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338)
       at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$2.execute(ThriftColumnFamilyQueryImpl.java:397)
       at com.netflix.astyanax.recipes.reader.AllRowsReader$1.call(AllRowsReader.java:447)
       at com.netflix.astyanax.recipes.reader.AllRowsReader$1.call(AllRowsReader.java:419)
       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:745)

We did receive a portion of data which changes on every try. We used following method.
boolean result = new AllRowsReader.Builder<String, String>(keyspace, CF_STANDARD1)
        .withColumnRange(null, null, false, 0)
        .withPartitioner(null) // this will use keyspace's partitioner
        .forEachRow(new Function<Row<String, String>, Boolean>() {
            @Override
            public Boolean apply(@Nullable Row<String, String> row) {
                // Process the row here ...
                return true;
            }
        })
        .build()
        .call();

Tried setting concurrency level as mentioned in this post (https://github.com/Netflix/astyanax/issues/411)
as well on both astyanax 1.56.49 and 2.0.0. Still nothing.

Mime
View raw message