Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D778791C7 for ; Mon, 6 Feb 2012 13:21:57 +0000 (UTC) Received: (qmail 11457 invoked by uid 500); 6 Feb 2012 13:21:55 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 11275 invoked by uid 500); 6 Feb 2012 13:21:54 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 11267 invoked by uid 99); 6 Feb 2012 13:21:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Feb 2012 13:21:54 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [212.223.165.70] (HELO frontend.clustermail.de) (212.223.165.70) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Feb 2012 13:21:46 +0000 Received: from [87.166.56.5] (helo=[192.168.123.88]) by frontend.clustermail.de with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72) (envelope-from ) id 1RuOVZ-0006ya-7m for user@cassandra.apache.org; Mon, 06 Feb 2012 14:21:25 +0100 Message-ID: <4F2FD3CF.6090004@innotec-data.de> Date: Mon, 06 Feb 2012 14:21:19 +0100 From: Heiner Bunjes User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Need database to log and retrieve sensor data Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit I need a database to log and retrieve sensor data. Is cassandra the right solution for this task and if, how should I set it up and which access methods should I use? If not, which other DB system might be a better fit? The details are as follows: ######## Glossary - Node = A computer on which an instance of the database is running - Blip = one data record send by a sensor - Blip page = The sorted list of all blips for a specific sensor and a specific time range. The scale is as follows: (01) 10E6 sensors deliver 1 blip every 100 seconds -> Insert rate = 10 kiloblip/s -> Insert rate ~ 315 gigablip/Year (02) They have to be stored for ~3 years -> Size of database = 1 terablip (03) Each blip has about 200 bytes -> Size of database = 200TB (04) The system will start with just 10E4 sensors but will soon increase upto the described volume. The main operations on the data are: (05) Add the new blips to the database (written blips are never changed)! (06) Return all blips for sensor X with a timestamp between timestamp_a and timestamp_b! With other words: Return a blip page. (07) Return all the blips specified in (06) ordered by timestamp! (08) Delete all blips older than Y! Further the following is true: (09) Each added blip is clearly (without ambiguity) identifiable by sensor_id+timestamp. (10) 99.9% of the blips are inserted in chronological order, the rest is not. (11) The database system MUST be free and open source. (12) The DB SHOULD be easy to administrate. (13) All data MUST still be writable and readable while less then the configurable number N of nodes are down (unexpectedly). (14) The mechanisms to distribute the data to the available nodes SHOULD be handled by the database. This means that the database SHOULD automatically redistribute the data when nodes are added or removed. (15) The project is mainly implemented in erlang, so there must be a stable erlang interface for database access. ######## Many thanks in advance Heiner