incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamas Marki <tma...@gmail.com>
Subject Re: Storing and querying IP ranges in Cassandra
Date Wed, 02 Nov 2011 08:11:01 GMT
On Tue, Nov 1, 2011 at 6:30 PM, Brandon Williams <driftx@gmail.com> wrote:

> On Tue, Nov 1, 2011 at 11:17 AM, Tamas Marki <tmarki@gmail.com> wrote:
> > Hello,
> >
> > I'm new to the list and also to Cassandra. I found it when I was
> searching
> > for something to replace our busy mysql server.
> >
> > One of the things we use the server for is filtering IPs based on a list
> of
> > IP ranges. These ranges can be small and big, and there are about 50k of
> > them in the database.
> >
> > In mysql this is pretty quick: they are stored as integers, and the query
> > basically looks like (say ip is the ip we want to find the all the ranges
> > for):
> >
> > select range from rangelist where ip_start<=ip and ip_end>=ip;
> >
> > I tried to move this schema to Cassandra, but it turned out to be very
> slow,
> > even with indexes on both columns. Since I also had to have an EQ
> expression
> > in the query, I added an indexed text field which was the same for all
> rows,
> > so the query in cassandra was something like this:
> >
> > select range from rangelist where type='ip' and ip_start<=ip and
> ip_end>=ip;
> >
> > This was very slow, and I imagine it is because it has to scan through
> all
> > the rows, making the index useless.
>
> This basically boils down to a binary search problem, so you don't
> really need an index.  Assuming IPv4, what I would do is make the
> first two bytes (class A and B, respectively) the row key. This will
> give you 65025 rows, each with possibly 65025 columns (each column
> name will be the other two bytes.)  When you need to find an ip, you
> go to the row key and then slice the columns to find a match.  This
> works well until you need to search an entire class A, in which case
> you'll need to do 255 checks, but in parallel this won't be too bad,
> especially because the bloom filter will save you on non-existent
> rows.  Presumably there is no need to search all class As, unless for
> some reason you don't know the first byte, which would be somewhat
> strange.  If you do need to span a few class As this will begin to
> fall apart, but hopefully that's not a common use case.
>
>
Thans, this sounds like a good plan, as I also need to store some info
about the ranges, and this way the values can store that.
I'll try to implement this and will get back to you on how it performs.


-- 
Tamas Marki

Mime
View raw message