Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 78FA9113D4 for ; Fri, 11 Jul 2014 12:53:02 +0000 (UTC) Received: (qmail 86758 invoked by uid 500); 11 Jul 2014 12:53:01 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 86701 invoked by uid 500); 11 Jul 2014 12:53:01 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 86691 invoked by uid 99); 11 Jul 2014 12:53:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jul 2014 12:53:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of david.medinets@gmail.com designates 74.125.82.45 as permitted sender) Received: from [74.125.82.45] (HELO mail-wg0-f45.google.com) (74.125.82.45) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jul 2014 12:52:56 +0000 Received: by mail-wg0-f45.google.com with SMTP id x12so996994wgg.28 for ; Fri, 11 Jul 2014 05:52:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=SSGww/kolVzJQ6ku8V5dXJZmlqjDDGAiRicMuIYxC4k=; b=EhuGsoww5FuGZTZTEVRTB1cXy4o+ZQDu32Hu19RTNJIo82iDz3dk9q+Z1VSeIKCzsp m5z94LcnYl1UXqcMJyf1/9DsxKJ8Kz7cdSd2tcgme8Nuiei1JR20YBLETXCpP8pti5lB JHgtuyZYilqmi3TxQS43pLPt6SOEKvg8Bs0jfGMwmrJ0b8Upf2XcJs5l+rh3Yl/DbN/O HzhE2Mb9I7wlbMQPYkFM3zksx8Ww208S/MUB6+sd+NgSOubvtrlRMBy3X13yYq/ZsE8y PAEEleqYDDhrzL2XcS/IpoljKQZGGEe1kdx0Em3iXY46Q6PbEy4gwEiaK1FczkWsTD7a a8Qg== MIME-Version: 1.0 X-Received: by 10.194.80.7 with SMTP id n7mr64577748wjx.8.1405083151676; Fri, 11 Jul 2014 05:52:31 -0700 (PDT) Received: by 10.194.2.132 with HTTP; Fri, 11 Jul 2014 05:52:31 -0700 (PDT) In-Reply-To: <39b59e57-5a60-4689-8b2b-269b897ac073@default> References: <53A8769E.4080806@gmail.com> <53A87ED3.6020409@gmail.com> <53A8809C.2030205@gmail.com> <53A88DF3.7060203@gmail.com> <20140710150818.GA2584@ll.mit.edu> <39b59e57-5a60-4689-8b2b-269b897ac073@default> Date: Fri, 11 Jul 2014 08:52:31 -0400 Message-ID: Subject: Re: How does Accumulo compare to HBase From: David Medinets To: accumulo-user Content-Type: multipart/alternative; boundary=047d7beb9c8022bc1e04fdea6a14 X-Virus-Checked: Checked by ClamAV on apache.org --047d7beb9c8022bc1e04fdea6a14 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable >On Fri, Jul 11, 2014 at 7:25 AM, wrote: > >The entirety of both data corpuses were re-loaded every night? Yes. >What did the users do while the data was dropped and reloaded? The technique of 'dropping and reloading' was not used. Users were not impacted. For the original system, we used a combination of Sqoop and the Fair Scheduler in Hadoop to throttle the export. For Accumulo, we created new tables using a date-based naming convention. Accumulo queries used a lookup process to find the current table. When the new table was ready it was automatically used. >What happened in the middle of night if the job failed? Why has this conversation topic changed to "Is David competent to design an ingest system"? >Couldn=E2=80=99t you identify the incremental updates to the two sources >and incrementally load the new data into the combined target? Yes, we could. But, for reasons not germane to this conversation, we pulled the whole corpus. >This brute force implementation is only applicable to a few use >cases with lax SLAs. Ok. >>From: David Medinets [mailto:david.medinets@gmail.com] >>Last year, I used Accumulo's rapid ingest ability to join two data >>silos into one dataset. Every field was fully indexed. Having all >>of the data in one place allowed cross-referencing queries to be >>executed. For various reason, this kind of query was not possible >>using the existing technology. The rapid ingest was important >>because a new copy of the data silos was pulled every night. --047d7beb9c8022bc1e04fdea6a14 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

>On Fri, Jul 11, 2014 at 7:25 AM, <chuck.adams@oracle.com> wrote:
><= br>>The entirety of both data corpuses were re-loaded every night?
Yes.

>What did the users do while the data was dropped and reload= ed?=C2=A0

The technique of 'dropping and reloading' was not= used.

Users were not impacted. For the original system, we used
= a combination of Sqoop and the Fair Scheduler in Hadoop to
throttle the export. For Accumulo, we created new tables using
a date-b= ased naming convention. Accumulo queries used a lookup
process to find t= he current table. When the new table was
ready it was automatically use= d.

>What happened in the middle of night if the job failed?

Why= has this conversation topic changed to "Is David
competent to des= ign an ingest system"?

>Couldn=E2=80=99t you identify the in= cremental updates to the two sources
>and incrementally load the new data into the combined target?

Ye= s, we could. But, for reasons not germane to this conversation,
we pull= ed the whole corpus.

>This brute force implementation is only app= licable to a few use
>cases with lax SLAs.

Ok.

>>From: David Medinets [ma= ilto:david.medinets@gmail.com]
>>Last year, I used Accumulo's rapid ingest ability to joi= n two data
>>silos into one dataset. Every field was fully indexed. Having all <= br>>>of the data in one place allowed cross-referencing queries to be=
>>executed. For various reason, this kind of query was not possib= le
>>using the existing technology. The rapid ingest was important
&= gt;>because a new copy of the data silos was pulled every night.

=
--047d7beb9c8022bc1e04fdea6a14--