Return-Path: X-Original-To: apmail-incubator-accumulo-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-accumulo-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0382C9ED2 for ; Thu, 22 Dec 2011 22:47:45 +0000 (UTC) Received: (qmail 36584 invoked by uid 500); 22 Dec 2011 22:47:42 -0000 Delivered-To: apmail-incubator-accumulo-user-archive@incubator.apache.org Received: (qmail 36492 invoked by uid 500); 22 Dec 2011 22:47:42 -0000 Mailing-List: contact accumulo-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: accumulo-user@incubator.apache.org Delivered-To: mailing list accumulo-user@incubator.apache.org Received: (qmail 36465 invoked by uid 99); 22 Dec 2011 22:47:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Dec 2011 22:47:42 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jesse.k.yates@gmail.com designates 209.85.215.47 as permitted sender) Received: from [209.85.215.47] (HELO mail-lpp01m010-f47.google.com) (209.85.215.47) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Dec 2011 22:47:38 +0000 Received: by lami14 with SMTP id i14so3372569lam.6 for ; Thu, 22 Dec 2011 14:47:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=qF38wRfANpRYxIWdR/qzRqbTwDYdKjEaFVnjj07KG28=; b=d/H2ooaLKMM4SHQWZyKOC46fNx7bpBS6jBTm+fPuOzrjmzhuKxfPSOHbyVLit1hK/r dN9TEpHslXRuKyUVEbFWkbw9LKYAUq1Oi7aROVLkSDrqrvrydgbUhHEE0H3ouMoywm+f +qHQY+IPoRdc7OyjlrCDqSnbL+sWwRSr4lXyc= Received: by 10.152.110.169 with SMTP id ib9mr1496496lab.33.1324594035127; Thu, 22 Dec 2011 14:47:15 -0800 (PST) MIME-Version: 1.0 Received: by 10.152.27.15 with HTTP; Thu, 22 Dec 2011 14:46:54 -0800 (PST) In-Reply-To: References: From: Jesse Yates Date: Thu, 22 Dec 2011 14:46:54 -0800 Message-ID: Subject: Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems To: dev@hbase.apache.org Cc: user@hbase.apache.org, accumulo-dev@incubator.apache.org, accumulo-user@incubator.apache.org Content-Type: multipart/alternative; boundary=f46d04088f5bef73c404b4b61426 --f46d04088f5bef73c404b4b61426 Content-Type: text/plain; charset=ISO-8859-1 I just updated trunk so that we don't build the accumulo package by default. If you want to build with accumulo, right now we are supporting the "accumulo-1.3.5-incubating" branch, which supports the current released version of accumulo (accumulo-1.3.5). Hopefully, in the near future, we can start hosting the accumulo snapshots in a publicly accessible maven repository, and we can merge the accumulo branch back into trunk. On Thu, Dec 22, 2011 at 2:35 PM, Ted Yu wrote: > Thanks for the hint. That works. > > I had to modify culvert-accumulo/pom.xml so that it looks for > 1.5.0-incubating-SNAPSHOT which was built by accumulo TRUNK. > > On Thu, Dec 22, 2011 at 2:22 PM, Jesse Yates >wrote: > > > Wow, that's embarrassing - project not building... > > > > It's because accumulo's release is no longer deployed into the standard > > apache maven repository. Maybe one of the accumulo committers can shed > some > > light on where to find it? > > > > I'll make some changes and have it at least compiling from the raw > tonight > > :) > > > > The alternative is to download accumulo source ( > > https://github.com/apache/accumulo) and "mvn clean install" to get it > > working on your local machine. > > > > Thanks Ted! > > > > -Jesse > > > > On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu wrote: > > > > > Thanks for the update, Jesse. > > > Let us know of any feature Culvert needs from HBase. > > > > > > After cloning Culvert, I got: > > > > > > [INFO] Culvert - Accumulo Integration .................... FAILURE > > [0.431s] > > > [INFO] > > > > ------------------------------------------------------------------------ > > > [INFO] BUILD FAILURE > > > [INFO] > > > > ------------------------------------------------------------------------ > > > [INFO] Total time: 1:06.638s > > > [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011 > > > [INFO] Final Memory: 20M/81M > > > [INFO] > > > > ------------------------------------------------------------------------ > > > [ERROR] Failed to execute goal on project culvert-accumulo: Could not > > > resolve dependencies for project > > > com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find > > > artifact > org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT > > in > > > apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1] > > > > > > Can someone provide hint ? > > > > > > On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates > > >wrote: > > > > > > > Culvert was originally introduced at Hadoop Summit 2011, but recent > > > updates > > > > have made it very applicable to current systems. Recently, we added > > > support > > > > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop > > > > Summit, there have also been significant code cleanup and added some > > > small > > > > features. However, we found that most people hadn't heard of Culvert, > > so > > > we > > > > wanted to re-release the framework. > > > > > > > > For an introduction to using Culvert, check out the blog post here: > > > > http://jyates.github.com/2011/11/17/intro-to-culvert.html > > > > > > > > Also, the original presentation (where we discuss the internals) is > > > > available on slideshare< > > > > > > > > > > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data > > > > > > > > > . > > > > > > > > There is a Culvert hackathon in the middle of January: > > > > http://culverthackathon2012.eventbrite.com/ > > > > > > > > Oh, and you can find the code on > > > > github > > > > . > > > > > > > > Below is an overview of why we wrote Culvert and what it does. > > > > > > > > Secondary indexing is a common design pattern in BigTable-like > > databases > > > > that allows users to index one or more columns in a table. This > > technique > > > > enables fast search of records in a database based on a particular > > column > > > > instead of the row id, thus enabling relational-style semantics in a > > > NoSQL > > > > environment. Frequently, the index is stored either in a reserved > > > namespace > > > > in the table or another index table. > > > > > > > > Despite the fact that this is a common design pattern in > BigTable-based > > > > applications, most implementations of this practice to date have been > > > > tightly coupled with a particular application. As a result, few > > > > general-purpose frameworks for secondary indexing on BigTable-like > > > > databases exist, and those that do are tied to a particular > > > implementation > > > > of the BigTable model. > > > > > > > > There are several existing tools (Solr, Lily), but these are focused > on > > > > doing text based search and are highly restrictive to indexes created > > > > through their framework. What if you want to use your existing > indexes? > > > Or > > > > leverage the indexes to do complex queries? > > > > > > > > We developed a solution to this problem called Culvert that supports > > > online > > > > index updates as well as a variation of the HIVE query language. In > > > > designing Culvert, we sought to make the solution pluggable so that > it > > > can > > > > be used on any of the many BigTable-like databases (HBase, Cassandra, > > > > etc.). Furthermore, it is also easily extensible to existing, hand > > rolled > > > > indexes. > > > > > > > > As well as being a secondary indexing framework, it is also a query > > > > execution mechanism - think pig/hive minus the fancy command line. We > > > > support a subset of SQL, but are able to take full advantage of > > > home-rolled > > > > and built-in indexes, leading to query execution times potentially > > orders > > > > of magnitude smaller than existing approaches and certainly orders of > > > > magnitude more easily. > > > > > > > > -- Jesse > > > > ------------------- > > > > Jesse Yates > > > > 240-888-2200 > > > > @jesse_yates > > > > > > > > > > > > > > > -- > > ------------------- > > Jesse Yates > > 240-888-2200 > > @jesse_yates > > > -- ------------------- Jesse Yates 240-888-2200 @jesse_yates --f46d04088f5bef73c404b4b61426 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I just updated trunk so that we don't build the accumulo package by def= ault.

If you want to build with accumulo, right now we are supportin= g the "accumulo-1.3.5-incubating" branch, which supports the curr= ent released version of accumulo (accumulo-1.3.5).

Hopefully, in the near future, we can start hosting the accumulo snapsh= ots in a publicly accessible maven repository, and we can merge the accumul= o branch back into trunk.

On Thu, Dec 22= , 2011 at 2:35 PM, Ted Yu <yuzhihong@gmail.com> wrote:
Thanks for the hint. That works.

I had to modify culvert-accumulo/pom.xml so that it looks for
1.5.0-incubating-SNAPSHOT which was built by accumulo TRUNK.

On Thu, Dec 22, 2011 at 2:22 PM, Jesse Yates <jesse.k.yates@gmail.com>wrote:

> Wow, that's embarrassing - project not building...
>
> It's because accumulo's release is no longer deployed into the= standard
> apache maven repository. Maybe one of the accumulo committers can shed= some
> light on where to find it?
>
> I'll make some changes and have it at least compiling from the raw= tonight
> :)
>
> The alternative is to download accumulo source (
> https= ://github.com/apache/accumulo) and "mvn clean install" to get= it
> working on your local machine.
>
> Thanks Ted!
>
> -Jesse
>
> On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Thanks for the update, Jesse.
> > Let us know of any feature Culvert needs from HBase.
> >
> > After cloning Culvert, I got:
> >
> > [INFO] Culvert - Accumulo Integration .................... FAILUR= E
> [0.431s]
> > [INFO]
> > -----------------------------------------------------------------= -------
> > [INFO] BUILD FAILURE
> > [INFO]
> > -----------------------------------------------------------------= -------
> > [INFO] Total time: 1:06.638s
> > [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> > [INFO] Final Memory: 20M/81M
> > [INFO]
> > -----------------------------------------------------------------= -------
> > [ERROR] Failed to execute goal on project culvert-accumulo: Could= not
> > resolve dependencies for project
> > com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not fi= nd
> > artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-S= NAPSHOT
> in
> > apache-snapshots (http://repository.apache.org/snapshots/) -> [= Help 1]
> >
> > Can someone provide hint ?
> >
> > On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.yates@gmail.com > > >wrote:
> >
> > > Culvert was originally introduced at Hadoop Summit 2011, but= recent
> > updates
> > > have made it very applicable to current systems. Recently, w= e added
> > support
> > > for Accumulo as well as upgraded HBase support to 0.92. Sinc= e Hadoop
> > > Summit, there have also been significant code cleanup and ad= ded some
> > small
> > > features. However, we found that most people hadn't hear= d of Culvert,
> so
> > we
> > > wanted to re-release the framework.
> > >
> > > For an introduction to using Culvert, check out the blog pos= t here:
> > > http://jyates.github.com/2011/11/17/intro-to-cu= lvert.html
> > >
> > > Also, the original presentation (where we discuss the intern= als) is
> > > available on slideshare<
> > >
> >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-f= or-secondary-indexing-of-structured-and-unstructured-data
> > > >
> > > .
> > >
> > > There is a Culvert hackathon in the middle of January:
> > > http://culverthackathon2012.eventbrite.com/
> > >
> > > Oh, and you can find the code on
> > > github<https://github.com/booz-allen-hamilton/culvert>
> > > .
> > >
> > > Below is an overview of why we wrote Culvert and what it doe= s.
> > >
> > > Secondary indexing is a common design pattern in BigTable-li= ke
> databases
> > > that allows users to index one or more columns in a table. T= his
> technique
> > > enables fast search of records in a database based on a part= icular
> column
> > > instead of the row id, thus enabling relational-style semant= ics in a
> > NoSQL
> > > environment. Frequently, the index is stored either in a res= erved
> > namespace
> > > in the table or another index table.
> > >
> > > Despite the fact that this is a common design pattern in Big= Table-based
> > > applications, most implementations of this practice to date = have been
> > > tightly coupled with a particular application. As a result, = few
> > > general-purpose frameworks for secondary indexing on BigTabl= e-like
> > > databases exist, and those that do are tied to a particular<= br> > > implementation
> > > of the BigTable model.
> > >
> > > There are several existing tools (Solr, Lily), but these are= focused on
> > > doing text based search and are highly restrictive to indexe= s created
> > > through their framework. What if you want to use your existi= ng indexes?
> > Or
> > > leverage the indexes to do complex queries?
> > >
> > > We developed a solution to this problem called Culvert that = supports
> > online
> > > index updates as well as a variation of the HIVE query langu= age. In
> > > designing Culvert, we sought to make the solution pluggable = so that it
> > can
> > > be used on any of the many BigTable-like databases (HBase, C= assandra,
> > > etc.). Furthermore, it is also easily extensible to existing= , hand
> rolled
> > > indexes.
> > >
> > > As well as being a secondary indexing framework, it is also = a query
> > > execution mechanism - think pig/hive minus the fancy command= line. We
> > > support a subset of SQL, but are able to take full advantage= of
> > home-rolled
> > > and built-in indexes, leading to query execution times poten= tially
> orders
> > > of magnitude smaller than existing approaches and certainly = orders of
> > > magnitude more easily.
> > >
> > > -- Jesse
> > > -------------------
> > > Jesse Yates
> > >
240-888-2200
> > > @jesse_yates
> > >
> >
>
>
>
> --
> -------------------
> Jesse Yates
> = 240-888-2200
> @jesse_yates
>



--
-----------= --------
Jesse Yates
240-888-2200
@jesse_yates

--f46d04088f5bef73c404b4b61426--