Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5B37F97C7 for ; Fri, 17 Feb 2012 12:43:24 +0000 (UTC) Received: (qmail 30587 invoked by uid 500); 17 Feb 2012 12:43:22 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 30557 invoked by uid 500); 17 Feb 2012 12:43:22 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 30545 invoked by uid 99); 17 Feb 2012 12:43:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Feb 2012 12:43:22 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of alessio@skye.it designates 109.168.113.132 as permitted sender) Received: from [109.168.113.132] (HELO www-cecchi.cbsolt.com) (109.168.113.132) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Feb 2012 12:43:17 +0000 Received: from [192.168.5.100] (unknown [88.149.230.97]) by www-cecchi.cbsolt.com (Postfix) with ESMTPSA id ABC5D62A64 for ; Fri, 17 Feb 2012 13:42:54 +0100 (CET) Message-ID: <4F3E4B4E.8000406@skye.it> Date: Fri, 17 Feb 2012 13:42:54 +0100 From: Alessio Cecchi User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.26) Gecko/20120131 Thunderbird/3.1.18 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: General questions about Cassandra Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Hi, we have developed a software that store logs from mail servers in MySQL, but for huge enviroments we are developing a version that store this data in HBase. Raw logs are, once a day, first normalized, so the output is like this: username,date of login, IP Address, protocol username,date of login, IP Address, protocol username,date of login, IP Address, protocol [...] and after inserted into the database. As I was saying, for huge installation (from 1 to 10 million of logins per day, keep for 12 months) we are working with HBase, but I would also consider Cassandra. The advantage of HBase is MapReduce which makes searching the logs very fast by splitting the "query" concurrently on multiple hosts. Query will be launched from a web interface (will be few requests per day) and the search keys are user and time range. But Cassandra seems less complex to manage and simply to run, so I want to evaluate it instead of HBase. My question is, can also Cassandra split a "query" over the cluster like MapReduce? Reading on-line Cassandra seems fast in insert data but slower than HBase to "query". Is it really so? We want not install Hadoop over Cassandra. Any suggestion is welcome :-) -- Alessio Cecchi is: @ ILS -> http://www.linux.it/~alessice/ on LinkedIn -> http://www.linkedin.com/in/alessice Assistenza Sistemi GNU/Linux -> http://www.cecchi.biz/ @ PLUG -> ex-Presidente, adesso senatore a vita, http://www.prato.linux.it @ LOLUG -> Socio http://www.lolug.net