I'm new to the list and also to Cassandra. I found it when I was searching for something to replace our busy mysql server.
One of the things we use the server for is filtering IPs based on a list of IP ranges. These ranges can be small and big, and there are about 50k of them in the database.
In mysql this is pretty quick: they are stored as integers, and the query basically looks like (say ip is the ip we want to find the all the ranges for):
select range from rangelist where ip_start<=ip and ip_end>=ip;
I tried to move this schema to Cassandra, but it turned out to be very slow, even with indexes on both columns. Since I also had to have an EQ expression in the query, I added an indexed text field which was the same for all rows, so the query in cassandra was something like this:
select range from rangelist where type='ip' and ip_start<=ip and ip_end>=ip;
This was very slow, and I imagine it is because it has to scan through all the rows, making the index useless.
The second thing I tried was to just expand the ranges and store individual IPs as the keys to a column family. This is very fast to query, but the problem is that I now have over 2.7 million rows, because some of the ranges are quite large.
As the number of ranges could change, this method could be a problem - imagine we add a whole A-class range, it would explode into millions of rows.
My question is, is there a more sane way to store this information, while still being able to find all the IP ranges that have the given IP in them?
I've been only dealing with Cassandra for a week or two, so I don't know about the inner details of what can be done, but I do have programming experience and am not afraid to get my hands dirty, in case it can be solved by writing some extension to Cassandra.
Looking forward to any suggestions.