Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC11C11275 for ; Wed, 2 Jul 2014 07:08:19 +0000 (UTC) Received: (qmail 88553 invoked by uid 500); 2 Jul 2014 07:08:16 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 88485 invoked by uid 500); 2 Jul 2014 07:08:16 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 88472 invoked by uid 99); 2 Jul 2014 07:08:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Jul 2014 07:08:16 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of carl.austin@gmail.com designates 209.85.160.173 as permitted sender) Received: from [209.85.160.173] (HELO mail-yk0-f173.google.com) (209.85.160.173) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Jul 2014 07:08:12 +0000 Received: by mail-yk0-f173.google.com with SMTP id 142so6248723ykq.32 for ; Wed, 02 Jul 2014 00:07:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=mz/98Oqn8LG3uGXbb/RR6B+CzGKKR6CnlfaCPfNrUj8=; b=q9UV2Vtsgx0zQG8WBE/Z22gK9UAoq2DWAh63Oi6+xFZSjLY+AOnw5XLpxtQEQtMXEp THTV8WhXK/4yfndQAhwpP4hkBVIk+QW9mMb/X9b7CmS8enUUMdLDWSI+scQF7Mij9jcj pFSUEcD1RmSS/+haV5BrsEgW1LGRmbgQvp0n3GlYoVxcNtidnLlhFSnk9uEB02s5iAGR 3tik2apKxCX3zEaVE2cv6gUF/yAloy8LiXlRabJO+tVZEPkl0iK5LXBk6B2lsbXJf245 ycA946A/2f77ibMwyczQI3N46dVV/VYpGaejGRSiXdBMVqmOSVOAB/Nbc64jOAgEGUF3 ngDQ== MIME-Version: 1.0 X-Received: by 10.236.20.134 with SMTP id p6mr77758776yhp.22.1404284871773; Wed, 02 Jul 2014 00:07:51 -0700 (PDT) Received: by 10.170.213.69 with HTTP; Wed, 2 Jul 2014 00:07:51 -0700 (PDT) In-Reply-To: References: Date: Wed, 2 Jul 2014 08:07:51 +0100 Message-ID: Subject: Re: Accumulo iterators in HBase From: Carl Austin To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a11c202bcf212fd04fd308c55 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c202bcf212fd04fd308c55 Content-Type: text/plain; charset=UTF-8 Thanks for the time to look and comment and glad it sounds interesting, The reason I started on this was that I'm using Accumulo and want to make an application usable on both HBase and Accumulo with the same codebase. I do a lot of aggregations on data and I feel the Accumulo iterator mechanism is superior for this use case; it's one of the main reasons I went with Accumulo and one of the only remaining major differences between the two applications now that HBase has implemented cell level ACLs. For example, as I am ingesting a main table of data I am creating many other question focused tables that keep answers like how many times did I see combinations of values, when was the last time I saw combinations together, how many distinct values where in this field for each combination (using probabilistic counting of course) and many more. All of these things are well suited to Accumulo iterators for performance at scale because of how they run at compaction time across key/values that are already being read at that point, rather than having to update the answers to these questions on every single insert. This use case won't be for everyone, but the iterator mechanism is pretty neat, powerful and a real differentiator in Accumulo (of course there are many differentiators in HBase too!). Thanks Carl On Tue, Jul 1, 2014 at 6:57 PM, Stack wrote: > Interesting project Carl. Use Cell interface instead of KeyValue if you > can (especially given you are copying to accumulo key/value). What you > thinking? What would be the use case? > Thanks, > St.Ack > > > On Tue, Jul 1, 2014 at 2:43 AM, Carl Austin wrote: > > > Hi, > > > > I've recently been doing a little research into getting Accumulo > iterators > > working in HBase, and in my very basic example I seem to have been able > to > > do this for all three scopes (scan, min compaction and major compaction, > or > > scan, flush and compaction in HBase terminology). > > > > I was hoping that an HBase guru would be able to take a look at my > approach > > - https://github.com/carlaustin/hbase-accumulo-iterators. It's very > > simple, > > just 7 small classes. > > > > I've done it by creating wrappers that can convert from accumulo > iterators > > to HBase scanners and back, allowing me to wrap a scanner as an iterator, > > hand it to an accumulo iterator as the start of an iterator chain, and > then > > wrap that back to a scanner and return it. I've then used a > RegionObserver > > to implement this on flush, compact and scan. > > > > You can see from the example I've done no iterator management or anything > > at this point, it simply applies an iterator that changes all values to > the > > word "carl" for a table called "test". If it looks like this is a go-er > > then I would look to continue work. > > > > I'd really appreciate any comments on the approach to things I've missed, > > even if they make this a total non-starter. > > > > Thanks > > > > Carl > > > --001a11c202bcf212fd04fd308c55--