> Hello,This basically boils down to a binary search problem, so you don't
> I'm new to the list and also to Cassandra. I found it when I was searching
> for something to replace our busy mysql server.
> One of the things we use the server for is filtering IPs based on a list of
> IP ranges. These ranges can be small and big, and there are about 50k of
> them in the database.
> In mysql this is pretty quick: they are stored as integers, and the query
> basically looks like (say ip is the ip we want to find the all the ranges
> select range from rangelist where ip_start<=ip and ip_end>=ip;
> I tried to move this schema to Cassandra, but it turned out to be very slow,
> even with indexes on both columns. Since I also had to have an EQ expression
> in the query, I added an indexed text field which was the same for all rows,
> so the query in cassandra was something like this:
> select range from rangelist where type='ip' and ip_start<=ip and ip_end>=ip;
> This was very slow, and I imagine it is because it has to scan through all
> the rows, making the index useless.
really need an index. Assuming IPv4, what I would do is make the
first two bytes (class A and B, respectively) the row key. This will
give you 65025 rows, each with possibly 65025 columns (each column
name will be the other two bytes.) When you need to find an ip, you
go to the row key and then slice the columns to find a match. This
works well until you need to search an entire class A, in which case
you'll need to do 255 checks, but in parallel this won't be too bad,
especially because the bloom filter will save you on non-existent
rows. Presumably there is no need to search all class As, unless for
some reason you don't know the first byte, which would be somewhat
strange. If you do need to span a few class As this will begin to
fall apart, but hopefully that's not a common use case.