Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6C050CF83 for ; Wed, 25 Apr 2012 13:44:24 +0000 (UTC) Received: (qmail 29707 invoked by uid 500); 25 Apr 2012 13:44:24 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 29663 invoked by uid 500); 25 Apr 2012 13:44:23 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 29655 invoked by uid 99); 25 Apr 2012 13:44:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 13:44:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.216.169] (HELO mail-qc0-f169.google.com) (209.85.216.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 13:44:19 +0000 Received: by qcsd16 with SMTP id d16so74700qcs.0 for ; Wed, 25 Apr 2012 06:43:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:mime-version:content-type:subject:date:in-reply-to:to :references:message-id:x-mailer:x-gm-message-state; bh=zSkPyrOJK+fhe7PSUrcFdA8b7zSx0CPWelPNwqoAOso=; b=ax1pxRNKAvWsCTEwE869FsypSh/STmGPHAaEs8d70VQN62WNhUjv2niiv0zRnzVWQv 1Z+gb0ctdUunAWc/foIvJOHjZkpoK7Z0Stu62Lxe2SXhEPRjSNM3MBm6hFHoSIIxe/Gf FfsDy+kyvsOxO0xsa20tY2J9OXcHRVU1EhzjMg2k1IUFbfzH7P84l+D6ViFrEhQ1Bb6Z sKyL8QQGQK/fOPfScoHPk8pI+MDJS9VgWu6KI5byh88vwwbbr6rbOeNurJIlRJajdtpz DVdVSEWrUuy6tEKIQ1V71lZPQAj34hxUpjPe3gXHaZXkub2eVJChnxRFIum0bwoAeVGj PxrA== Received: by 10.229.135.132 with SMTP id n4mr539901qct.53.1335361436911; Wed, 25 Apr 2012 06:43:56 -0700 (PDT) Received: from new-host-2.home (pool-71-191-158-67.washdc.fios.verizon.net. [71.191.158.67]) by mx.google.com with ESMTPS id z3sm314042qao.9.2012.04.25.06.43.55 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 25 Apr 2012 06:43:56 -0700 (PDT) From: Aaron Cordova Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: multipart/alternative; boundary="Apple-Mail=_AD88DE4C-9BDB-4157-817A-D5219FFF6866" Subject: Re: Trendulo - A Twitter Analytics Demo on Accumulo Date: Wed, 25 Apr 2012 09:43:54 -0400 In-Reply-To: To: user@accumulo.apache.org References: <1965451963.418011.1335289223795.JavaMail.root@linzimmb04o.imo.intelink.gov> Message-Id: X-Mailer: Apple Mail (2.1257) X-Gm-Message-State: ALoCoQk33zO0eBZrLh9Y7ffmf1MXU7kl1M7fiWYfspnPYapmTFhp8gRCumyO/52EUPV918c4n6yh X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_AD88DE4C-9BDB-4157-817A-D5219FFF6866 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Speaking of storage - are you using EBS or local instance storage?=20 On Apr 25, 2012, at 8:52 AM, Eric Newton wrote: > How many key-values does a single tweet become, on average? What's = the storage size per tweet? >=20 > On Wed, Apr 25, 2012 at 12:17 AM, Jared winick = wrote: > Thanks for the kind words, I appreciate it. Keith, my ingest process > was down on Mar 19-20, so that is why I am missing data for that > period. >=20 > For those who are curious, I am receiving about 1.2 million tweets a > day and have about 3 billion entries in my main table. I am actually > getting by with everything running on an EC2 medium instance, which is > obviously very far from ideal but I am trying to stay on a budget. >=20 > I hope to add new features as time allows, things like near real-time > trending and geospatial analytics. If anyone has any ideas for > features they think would be interesting, just let me know or add them > as issues on the github page. >=20 > On Tue, Apr 24, 2012 at 11:40 AM, Billie J Rinaldi > wrote: > > That's so cool that I'm creating a new section for it on our page of = links: > > http://accumulo.apache.org/papers.html > > > > Billie > > > > On Tuesday, April 24, 2012 9:35:31 AM, "Jared winick" = wrote: > >> I gave an Introduction to Apache Accumulo presentation last month = at > >> the Boulder/Denver Meetup where I demoed an application that used > >> Accumulo to provide real-time and historical access to = words/phrases > >> seen in Twitter messages as well as daily trend analysis. I finally > >> got the demo polished up a bit and running on Amazon EC2 where it = can > >> be found at http://trendulo.com . > >> > >> Trendulo is still pretty Alpha at this point so please feel free to > >> add to the existing documented issues at > >> https://github.com/jaredwinick/trendulo where you can also = obviously > >> find the source. > >> > >> > >> As an example, the following link will show the launch of = Instagram's > >> Android client, followed by Facebook's purchase and then a small > >> increase in general "chatter" about the product http://goo.gl/XcCG8 > >> > >> > >> Let me know if anyone has any questions or comments. Feel free to > >> tweet @trendulo any interesting searches and I can retweet them = out. > >> > >> > >> Jared >=20 --Apple-Mail=_AD88DE4C-9BDB-4157-817A-D5219FFF6866 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=iso-8859-1 Speaking of storage - are you using EBS or local instance storage? 

On Apr 25, 2012, at 8:52 AM, Eric Newton wrote:

How many key-values does a single tweet become, on average?  What's the storage size per tweet?

On Wed, Apr 25, 2012 at 12:17 AM, Jared winick <jaredwinick@gmail.com> wrote:
Thanks for the kind words, I appreciate it. Keith, my ingest process
was down on Mar 19-20, so that is why I am missing data for that
period.

For those who are curious, I am receiving about 1.2 million tweets a
day and have about 3 billion entries in my main table.  I am actually
getting by with everything running on an EC2 medium instance, which is
obviously very far from ideal but I am trying to stay on a budget.

I hope to add new features as time allows, things like near real-time
trending and geospatial analytics.  If anyone has any ideas for
features they think would be interesting, just let me know or add them
as issues on the github page.

On Tue, Apr 24, 2012 at 11:40 AM, Billie J Rinaldi
<billie.j.rinaldi@ugov.gov> wrote:
> That's so cool that I'm creating a new section for it on our page of links:
> http://accumulo.apache.org/papers.html
>
> Billie
>
> On Tuesday, April 24, 2012 9:35:31 AM, "Jared winick" <jaredwinick@gmail.com> wrote:
>> I gave an Introduction to Apache Accumulo presentation last month at
>> the Boulder/Denver Meetup where I demoed an application that used
>> Accumulo to provide real-time and historical access to words/phrases
>> seen in Twitter messages as well as daily trend analysis. I finally
>> got the demo polished up a bit and running on Amazon EC2 where it can
>> be found at http://trendulo.com .
>>
>> Trendulo is still pretty Alpha at this point so please feel free to
>> add to the existing documented issues at
>> https://github.com/jaredwinick/trendulo where you can also obviously
>> find the source.
>>
>>
>> As an example, the following link will show the launch of Instagram's
>> Android client, followed by Facebook's purchase and then a small
>> increase in general "chatter" about the product http://goo.gl/XcCG8
>>
>>
>> Let me know if anyone has any questions or comments. Feel free to
>> tweet @trendulo any interesting searches and I can retweet them out.
>>
>>
>> Jared


--Apple-Mail=_AD88DE4C-9BDB-4157-817A-D5219FFF6866--