Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8474B9A2A for ; Thu, 22 Dec 2011 22:35:54 +0000 (UTC) Received: (qmail 20419 invoked by uid 500); 22 Dec 2011 22:35:52 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 20270 invoked by uid 500); 22 Dec 2011 22:35:52 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 20236 invoked by uid 99); 22 Dec 2011 22:35:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Dec 2011 22:35:52 +0000 X-ASF-Spam-Status: No, hits=4.7 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 74.125.82.51 as permitted sender) Received: from [74.125.82.51] (HELO mail-ww0-f51.google.com) (74.125.82.51) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Dec 2011 22:35:47 +0000 Received: by wgbdr1 with SMTP id dr1so11966889wgb.20 for ; Thu, 22 Dec 2011 14:35:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ZNt8fufe9jAsTFIiNAxkEt0PSztF6phsRgrSScHNiu4=; b=hu+P80o1ya0zYz7/ukHPkC8LoY5hcW1ksU3ZhEEj5whcOLJUwBjRP13rQUoCQq0bOt 7+hy6rb4e8wGEdwVoSxiXQst/RqEKTacC0L08nyWOO79awaBFak44HMAPzTRqzbmUlrM yNx4ci0lCIgWACN3wRAgBtYHoNlFyCbixZzrs= MIME-Version: 1.0 Received: by 10.227.206.10 with SMTP id fs10mr12210918wbb.13.1324593326201; Thu, 22 Dec 2011 14:35:26 -0800 (PST) Received: by 10.216.164.68 with HTTP; Thu, 22 Dec 2011 14:35:26 -0800 (PST) In-Reply-To: References:

Date: Thu, 22 Dec 2011 14:35:26 -0800 Message-ID: Subject: Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems From: Ted Yu To: dev@hbase.apache.org Cc: user@hbase.apache.org, accumulo-dev@incubator.apache.org, accumulo-user@incubator.apache.org Content-Type: multipart/alternative; boundary=0015174c1036ae1c5d04b4b5ea41 --0015174c1036ae1c5d04b4b5ea41 Content-Type: text/plain; charset=ISO-8859-1 Thanks for the hint. That works. I had to modify culvert-accumulo/pom.xml so that it looks for 1.5.0-incubating-SNAPSHOT which was built by accumulo TRUNK. On Thu, Dec 22, 2011 at 2:22 PM, Jesse Yates wrote: > Wow, that's embarrassing - project not building... > > It's because accumulo's release is no longer deployed into the standard > apache maven repository. Maybe one of the accumulo committers can shed some > light on where to find it? > > I'll make some changes and have it at least compiling from the raw tonight > :) > > The alternative is to download accumulo source ( > https://github.com/apache/accumulo) and "mvn clean install" to get it > working on your local machine. > > Thanks Ted! > > -Jesse > > On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu wrote: > > > Thanks for the update, Jesse. > > Let us know of any feature Culvert needs from HBase. > > > > After cloning Culvert, I got: > > > > [INFO] Culvert - Accumulo Integration .................... FAILURE > [0.431s] > > [INFO] > > ------------------------------------------------------------------------ > > [INFO] BUILD FAILURE > > [INFO] > > ------------------------------------------------------------------------ > > [INFO] Total time: 1:06.638s > > [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011 > > [INFO] Final Memory: 20M/81M > > [INFO] > > ------------------------------------------------------------------------ > > [ERROR] Failed to execute goal on project culvert-accumulo: Could not > > resolve dependencies for project > > com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find > > artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT > in > > apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1] > > > > Can someone provide hint ? > > > > On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates > >wrote: > > > > > Culvert was originally introduced at Hadoop Summit 2011, but recent > > updates > > > have made it very applicable to current systems. Recently, we added > > support > > > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop > > > Summit, there have also been significant code cleanup and added some > > small > > > features. However, we found that most people hadn't heard of Culvert, > so > > we > > > wanted to re-release the framework. > > > > > > For an introduction to using Culvert, check out the blog post here: > > > http://jyates.github.com/2011/11/17/intro-to-culvert.html > > > > > > Also, the original presentation (where we discuss the internals) is > > > available on slideshare< > > > > > > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data > > > > > > > . > > > > > > There is a Culvert hackathon in the middle of January: > > > http://culverthackathon2012.eventbrite.com/ > > > > > > Oh, and you can find the code on > > > github > > > . > > > > > > Below is an overview of why we wrote Culvert and what it does. > > > > > > Secondary indexing is a common design pattern in BigTable-like > databases > > > that allows users to index one or more columns in a table. This > technique > > > enables fast search of records in a database based on a particular > column > > > instead of the row id, thus enabling relational-style semantics in a > > NoSQL > > > environment. Frequently, the index is stored either in a reserved > > namespace > > > in the table or another index table. > > > > > > Despite the fact that this is a common design pattern in BigTable-based > > > applications, most implementations of this practice to date have been > > > tightly coupled with a particular application. As a result, few > > > general-purpose frameworks for secondary indexing on BigTable-like > > > databases exist, and those that do are tied to a particular > > implementation > > > of the BigTable model. > > > > > > There are several existing tools (Solr, Lily), but these are focused on > > > doing text based search and are highly restrictive to indexes created > > > through their framework. What if you want to use your existing indexes? > > Or > > > leverage the indexes to do complex queries? > > > > > > We developed a solution to this problem called Culvert that supports > > online > > > index updates as well as a variation of the HIVE query language. In > > > designing Culvert, we sought to make the solution pluggable so that it > > can > > > be used on any of the many BigTable-like databases (HBase, Cassandra, > > > etc.). Furthermore, it is also easily extensible to existing, hand > rolled > > > indexes. > > > > > > As well as being a secondary indexing framework, it is also a query > > > execution mechanism - think pig/hive minus the fancy command line. We > > > support a subset of SQL, but are able to take full advantage of > > home-rolled > > > and built-in indexes, leading to query execution times potentially > orders > > > of magnitude smaller than existing approaches and certainly orders of > > > magnitude more easily. > > > > > > -- Jesse > > > ------------------- > > > Jesse Yates > > > 240-888-2200 > > > @jesse_yates > > > > > > > > > -- > ------------------- > Jesse Yates > 240-888-2200 > @jesse_yates > --0015174c1036ae1c5d04b4b5ea41--