Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 441177A14 for ; Tue, 1 Nov 2011 16:18:47 +0000 (UTC) Received: (qmail 43075 invoked by uid 500); 1 Nov 2011 16:18:45 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 42992 invoked by uid 500); 1 Nov 2011 16:18:45 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 42984 invoked by uid 99); 1 Nov 2011 16:18:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Nov 2011 16:18:45 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tmarki@gmail.com designates 209.85.161.44 as permitted sender) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Nov 2011 16:18:38 +0000 Received: by faas12 with SMTP id s12so7887765faa.31 for ; Tue, 01 Nov 2011 09:18:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; bh=A8KfF5FPsWf5KNPpUuRSDNIy1ie8/bpEA6f26Ro+bMM=; b=J+vuqhNnqPPQGj0ITWzLWtLKxrxrh+p2aL4AOAvwVHJfHTt/gyvY5OVmSiaDv5Xzry ysbpyBuHVcjLS7SvfF+uPi596U+BM3XJXnI7TbWg5K1kTcQiIrj/MtbO8BKu+4tOaDWK JKbzXZhr1F5XVPZkallza9kynOYHji+mkFt34= Received: by 10.223.85.139 with SMTP id o11mr1734800fal.0.1320164297135; Tue, 01 Nov 2011 09:18:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.122.75 with HTTP; Tue, 1 Nov 2011 09:17:36 -0700 (PDT) From: Tamas Marki Date: Tue, 1 Nov 2011 17:17:36 +0100 Message-ID: Subject: Storing and querying IP ranges in Cassandra To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0ce0f7b4f9d66a04b0aeb368 --000e0ce0f7b4f9d66a04b0aeb368 Content-Type: text/plain; charset=UTF-8 Hello, I'm new to the list and also to Cassandra. I found it when I was searching for something to replace our busy mysql server. One of the things we use the server for is filtering IPs based on a list of IP ranges. These ranges can be small and big, and there are about 50k of them in the database. In mysql this is pretty quick: they are stored as integers, and the query basically looks like (say ip is the ip we want to find the all the ranges for): select range from rangelist where ip_start<=ip and ip_end>=ip; I tried to move this schema to Cassandra, but it turned out to be very slow, even with indexes on both columns. Since I also had to have an EQ expression in the query, I added an indexed text field which was the same for all rows, so the query in cassandra was something like this: select range from rangelist where type='ip' and ip_start<=ip and ip_end>=ip; This was very slow, and I imagine it is because it has to scan through all the rows, making the index useless. The second thing I tried was to just expand the ranges and store individual IPs as the keys to a column family. This is very fast to query, but the problem is that I now have over 2.7 million rows, because some of the ranges are quite large. As the number of ranges could change, this method could be a problem - imagine we add a whole A-class range, it would explode into millions of rows. My question is, is there a more sane way to store this information, while still being able to find all the IP ranges that have the given IP in them? I've been only dealing with Cassandra for a week or two, so I don't know about the inner details of what can be done, but I do have programming experience and am not afraid to get my hands dirty, in case it can be solved by writing some extension to Cassandra. Looking forward to any suggestions. Thanks, Tamas --000e0ce0f7b4f9d66a04b0aeb368 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello,

I'm new to the list and also to Cassandra. I found it whe= n I was searching for something to replace our busy mysql server.

On= e of the things we use the server for is filtering IPs based on a list of I= P ranges. These ranges can be small and big, and there are about 50k of the= m in the database.

In mysql this is pretty quick: they are stored as integers, and the que= ry basically looks like (say ip is the ip we want to find the all the range= s for):

select range from rangelist where ip_start<=3Dip and ip_e= nd>=3Dip;

I tried to move this schema to Cassandra, but it turned out to be very = slow, even with indexes on both columns. Since I also had to have an EQ exp= ression in the query, I added an indexed text field which was the same for = all rows, so the query in cassandra was something like this:

select range from rangelist where type=3D'ip' and ip_start<= =3Dip and ip_end>=3Dip;

This was very slow, and I imagine it is b= ecause it has to scan through all the rows, making the index useless.
The second thing I tried was to just expand the ranges and store individual= IPs as the keys to a column family. This is very fast to query, but the pr= oblem is that I now have over 2.7 million rows, because some of the ranges = are quite large.

As the number of ranges could change, this method could be a problem - = imagine we add a whole A-class range, it would explode into millions of row= s.

My question is, is there a more sane way to store this informatio= n, while still being able to find all the IP ranges that have the given IP = in them?

I've been only dealing with Cassandra for a week or two, so I don&#= 39;t know about the inner details of what can be done, but I do have progra= mming experience and am not afraid to get my hands dirty, in case it can be= solved by writing some extension to Cassandra.

Looking forward to any suggestions.

Thanks,
Tamas

--000e0ce0f7b4f9d66a04b0aeb368--