hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems
Date Thu, 22 Dec 2011 21:54:06 GMT
Thanks for the update, Jesse.
Let us know of any feature Culvert needs from HBase.

After cloning Culvert, I got:

[INFO] Culvert - Accumulo Integration .................... FAILURE [0.431s]
[INFO] Total time: 1:06.638s
[INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
[INFO] Final Memory: 20M/81M
[ERROR] Failed to execute goal on project culvert-accumulo: Could not
resolve dependencies for project
com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]

Can someone provide hint ?

On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.yates@gmail.com>wrote:

> Culvert was originally introduced at Hadoop Summit 2011, but recent updates
> have made it very applicable to current systems. Recently, we added support
> for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> Summit, there have also been significant code cleanup and added some small
> features. However, we found that most people hadn't heard of Culvert, so we
> wanted to re-release the framework.
> For an introduction to using Culvert, check out the blog post here:
> http://jyates.github.com/2011/11/17/intro-to-culvert.html
> Also, the original presentation (where we discuss the internals) is
> available on slideshare<
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> >
> .
> There is a Culvert hackathon in the middle of January:
> http://culverthackathon2012.eventbrite.com/
> Oh, and you can find the code on
> github<https://github.com/booz-allen-hamilton/culvert>
> .
> Below is an overview of why we wrote Culvert and what it does.
> Secondary indexing is a common design pattern in BigTable-like databases
> that allows users to index one or more columns in a table. This technique
> enables fast search of records in a database based on a particular column
> instead of the row id, thus enabling relational-style semantics in a NoSQL
> environment. Frequently, the index is stored either in a reserved namespace
> in the table or another index table.
> Despite the fact that this is a common design pattern in BigTable-based
> applications, most implementations of this practice to date have been
> tightly coupled with a particular application. As a result, few
> general-purpose frameworks for secondary indexing on BigTable-like
> databases exist, and those that do are tied to a particular implementation
> of the BigTable model.
> There are several existing tools (Solr, Lily), but these are focused on
> doing text based search and are highly restrictive to indexes created
> through their framework. What if you want to use your existing indexes? Or
> leverage the indexes to do complex queries?
> We developed a solution to this problem called Culvert that supports online
> index updates as well as a variation of the HIVE query language. In
> designing Culvert, we sought to make the solution pluggable so that it can
> be used on any of the many BigTable-like databases (HBase, Cassandra,
> etc.). Furthermore, it is also easily extensible to existing, hand rolled
> indexes.
> As well as being a secondary indexing framework, it is also a query
> execution mechanism - think pig/hive minus the fancy command line. We
> support a subset of SQL, but are able to take full advantage of home-rolled
> and built-in indexes, leading to query execution times potentially orders
> of magnitude smaller than existing approaches and certainly orders of
> magnitude more easily.
> -- Jesse
> -------------------
> Jesse Yates
> 240-888-2200
> @jesse_yates

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message