cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anuj Wadehra <>
Subject Efficient Paging Option in Wide Rows
Date Fri, 22 Apr 2016 14:57:32 GMT
I have a wide row index table so that I can fetch all row keys corresponding to a column value. 
Row of index_table will look like:
ColValue1:bucket1 >> rowkey1, rowkey2.. rowkeyn......ColValue1:bucketn>> rowkey1,
rowkey2.. rowkeyn
We will have buckets to avoid hotspots. Row keys of main table are random numbers and we will
never do column slice like:

Select * from index_table where key=xxx and Col > rowkey1 and col < rowkey10
Also, we will ALWAYS fetch all data for a given value of index column. Thus all buckets havr
to be read.
Each index column value can map to thousands-millions of row keys in main table.
Based on our use case, there are two design choices in front of me:
1. Have large number of buckets/rows for an index column value and have lesser data ( around
few thousands) in each row.
Thus, every time we want to fetch all row keys for an index col value, we will query more
rows and for each row we will have to page through data 500 records at a time.
2. Have fewer buckets/rows for an index column value.
Every time we want to fetch all row keys for an index col value, we will query data less numner
of wider rows and then page through each wide row reading 500 columns at a time.

Which approach is more efficient?
 Approach1: More number of rows with less data in each row.

Approach 2: less number of  rows with more data in each row

Either ways,  we are fetching only 500 records at a time in a query. Even in approach 2 (wider
rows) , we can query only small data of 500 at a time.


View raw message