Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DD606401C for ; Wed, 11 May 2011 18:26:01 +0000 (UTC) Received: (qmail 1797 invoked by uid 500); 11 May 2011 18:25:59 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 1770 invoked by uid 500); 11 May 2011 18:25:59 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 1762 invoked by uid 99); 11 May 2011 18:25:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 May 2011 18:25:59 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of edlinuxguru@gmail.com designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 May 2011 18:25:55 +0000 Received: by iyn15 with SMTP id 15so743484iyn.31 for ; Wed, 11 May 2011 11:25:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=Knx4zBLWkJ4GKKQSRc4lkMUoIdswjH2zA7oqKVbEEEM=; b=acDlTfhmLsWa4mA9jtSnA/84aePT2/Q1fNmHZhiGks4e8DKp3YnFScj2o8DMUMT+f4 itSQO7SC3LrKBTm/vpURqVOeASjyQS3dUWUEmDYJlfUJY2NfgAl4CFWwhX3+ir/XPfvx 3OecM7XbnuXbaUODzuqF/VamkSsq7BXQqhrQ0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=XI5wXXbxTvYUtPUdeNXVYXXilk52DGzAB5nPX6ANJu+4CP7v6weeRJlagqtTLV+KmL 7g5oXjB/KNUr53QDp1pZGMsLGrRbWxYJg01D8heF9/1b5Tpd5rme9ydDF5WBNt0tGJ+9 CdK2Q3+0M0DLhgrNUN8/ppfuXy9t4h76dTpDk= MIME-Version: 1.0 Received: by 10.42.138.195 with SMTP id d3mr3820515icu.241.1305138334607; Wed, 11 May 2011 11:25:34 -0700 (PDT) Received: by 10.42.177.138 with HTTP; Wed, 11 May 2011 11:25:34 -0700 (PDT) In-Reply-To: References: Date: Wed, 11 May 2011 14:25:34 -0400 Message-ID: Subject: Re: Online text search with Hadoop/Brisk From: Edward Capriolo To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 On Wed, May 11, 2011 at 11:19 AM, Ben Scholl wrote: > I keep reading that Hadoop/Brisk is not suitable for online querying, only > for offline/batch processing. What exactly are the reasons it is unsuitable? > My use case is a fairly high query load, and each query ideally would return > within about 20 seconds. The queries will use indexes to narrow down the > result set first, but they also need to support text search on one of the > fields. I was thinking of simulating the SQL LIKE statement, by running each > query as a MapReduce job so that the text search gets distributed between > nodes. > I know the recommended approach is to keep a seperate full-text index, but > that could be quite space-intensive, and also means you can only search on > complete words. Any thoughts on this approach? > Thanks, > Ben Brisk was made to me a tight integration of Cassandra Hadoop and Hive. If you are looking to full text searches you should look at Solandra, https://github.com/tjake/Solandra, which is an Cassandra backend for the Solr/Lucene indexes. Edward