Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 44CF2D08E for ; Tue, 16 Oct 2012 03:15:26 +0000 (UTC) Received: (qmail 57441 invoked by uid 500); 16 Oct 2012 03:15:25 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 57110 invoked by uid 500); 16 Oct 2012 03:15:24 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 57091 invoked by uid 99); 16 Oct 2012 03:15:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Oct 2012 03:15:24 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.138.91.60] (HELO nm22-vm0.bullet.mail.ne1.yahoo.com) (98.138.91.60) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Oct 2012 03:15:14 +0000 Received: from [98.138.90.51] by nm22.bullet.mail.ne1.yahoo.com with NNFMP; 16 Oct 2012 03:14:52 -0000 Received: from [98.138.88.239] by tm4.bullet.mail.ne1.yahoo.com with NNFMP; 16 Oct 2012 03:14:52 -0000 Received: from [127.0.0.1] by omp1039.mail.ne1.yahoo.com with NNFMP; 16 Oct 2012 03:14:52 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 869763.56831.bm@omp1039.mail.ne1.yahoo.com Received: (qmail 55525 invoked by uid 60001); 16 Oct 2012 03:14:52 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1350357292; bh=lvbAncAFhrxaXobLHB+gRm3+ZKCMExDIt6XkXbiK7Dc=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=QA9fyX+zmkXpwu+zUeh3DQ4FuMQdBITbwutuoznCEpCOS2DTvrh96jDxVLiokNXkeQSeAJ/YLi59JywmlkNWO7m271CTunYwpMJtvkOqPRJLwJ2+AJrMNkF383O+QEStwu0Ku5c6kmApFx7n0p3KFKaEJxnsOa1L6riaBNn7W68= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=SIuiL4a8E2SoiifS74YX5ZBcvs+NLWzYywrgFP8vexRXDbNtd/QXoiuBlgQjeONApwihgIMBPoYBOqL6gZvaC6Fd4XFsgS65MgaInwlPvyTP7WPCSYWV9lxm5HY+JkxTmBTG8irbh+1WqgowY57YXAUq2y2nJj1lQmOeoBHNnvI=; X-YMail-OSG: sghp9l8VM1kHPPPAUf91KwKxrdHXZJpTrWfKCkq42PD.SBM QEIllqlOMTNXihmuyuzcjQT.RaljtI2ejTZ23F4ejdel15Z5.XpCeDxLLJ0W PGPIENUx_a1ee7xI7Q.u_wjOv7XPgrWe96CthUUfKSSjA6sHQLe.iHTFuFa0 ozEyPEKBLCdAja_LGYdAh__jVP8lJUIn1RABLeKxujhBJgPN.Wn9Q5Hq17pe TgT.xlsbSnvgJx7lCq8RLe4HKdk4nEiMM7g5VnZCB9MF7bxKMf6Lbju_SpSc 1XTJBVEY8AYG_hOp_axkksJ3U9UNT9CpVy2M76nL7KtKDUrVfdnX0hP17BWJ MtGqoYuSOg2ax9eRCzlHCYDM_Ii6JwyV4Ih8ddLcxqeMN3q49oYQnVfaBY4W 22OT97MCdcSAQBOK9HxEVemZmv4yOOSM4xG0ZvUYd.l8ilLNRoG29bspFsFO e7Z.GbkKyW3H0f08ytFtE4C8DPHtNZQnAD6Hf5Mw_3TmE_LJEXKxz6UU6krE mK7Kb77slLecXn8lJbxS59NGEcfgODuFlZX_H.zml55GmpbGysPVUzSfBiX6 fc.RuvGGLehZr9Wc6zGhHG9A- Received: from [24.90.73.159] by web126005.mail.ne1.yahoo.com via HTTP; Mon, 15 Oct 2012 20:14:52 PDT X-Rocket-MIMEInfo: 001.001,SGkgTWlrZSwKClRoYW5rcyBmb3IgdGhlIGluZm8hIMKgT3VyIGRvY3MsIGhvd2V2ZXIsIGFyZSBub3QgcXVpdGUgMTAwTUIgLSBtb3JlIGxpa2UgNU1CIG1heCBhbmQgbW9zdCBvZiB0aGUgdGltZSB1bmRlciAxMEtCLiDCoFdvdWxkIHlvdSBzdGlsbCBzYXkgRmx1bWUgaXMgbm90IHRoZSByaWdodCB0b29sIGZvciB0aGUgam9iPyDCoElmIHNvLCB3aGF0IGlzIHRoZSBtYWluIGNvbmNlcm4_IMKgSXMgaXQgYWJvdXQgdGhlIG51bWJlciBvZiBkb2N1bWVudHMgRmx1bWUgd2lsbCBrZWVwIGluIG1lbW9yeSBhdCABMAEBAQE- X-Mailer: YahooMailWebService/0.8.123.450 References: <1350262192.51800.YahooMailNeo@web126006.mail.ne1.yahoo.com> Message-ID: <1350357292.54742.YahooMailNeo@web126005.mail.ne1.yahoo.com> Date: Mon, 15 Oct 2012 20:14:52 -0700 (PDT) From: Otis Gospodnetic Reply-To: Otis Gospodnetic Subject: Re: Flume for multi KB or MB docs? To: "user@flume.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="1833503604-1920448510-1350357292=:54742" X-Virus-Checked: Checked by ClamAV on apache.org --1833503604-1920448510-1350357292=:54742 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Hi Mike,=0A=0AThanks for the info! =A0Our docs, however, are not quite 100M= B - more like 5MB max and most of the time under 10KB. =A0Would you still s= ay Flume is not the right tool for the job? =A0If so, what is the main conc= ern? =A0Is it about the number of documents Flume will keep in memory at an= y one time and thus require a potentially large heap and still risk OOMing?= =A0Or is the main concern that writing such "large" documents to disk will= be slow?=0A=0AMy documents need to end up in Solr or ElasticSearch and may= be also in HDFS, so I was hoping I could get ES and HDFS sinks from Flume f= or free.=0A=0AOtis=A0=0A----=0APerformance Monitoring for Solr / ElasticSea= rch / HBase - http://sematext.com/spm=A0=0A=0A=0A=0A>______________________= __________=0A> From: Mike Percy =0A>To: user@flume.apach= e.org; Otis Gospodnetic =0A>Sent: Monday, Octo= ber 15, 2012 6:15 PM=0A>Subject: Re: Flume for multi KB or MB docs?=0A> =0A= >=0A>Hi Otis,=0A>Flume was designed as a streaming event transport system, = not as a general purpose file transfer system. The two have quite different= characteristics, so while binary files could be transported by Flume, if y= ou tried to transport a 100MB PDF as a single event you may have issues aro= und memory allocation, GC, transfer speed, etc., since we hold at least one= event at a time in memory. However if you want to transfer a large log fil= e and each line is an event then it's a perfect use case because you care a= bout the individual events more than the file itself.=0A>=0A>=0A>For transf= erring very large binary files that are not events or records, you may want= to look for something that it good at being a single-hop system with resum= e capability, like rsync, to transfer the files. Then I suppose you could u= se the hadoop fs shell and a small script to store the data onto HDFS. You = probably wouldn't need all the fancy tagging, routing, and serialization fe= atures that Flume has.=0A>=0A>=0A>Hope this helps.=0A>=0A>=0A>Regards=0A>Mi= ke=0A>=0A>=0A>On Sun, Oct 14, 2012 at 5:49 PM, Otis Gospodnetic wrote:=0A>=0A>Hi,=0A>>=0A>>=0A>>We're considering using F= lume for transport of potentially large "documents" (think documents that c= an be as small as tweets or as large as PDF files).=0A>>=0A>>=0A>>I'm wonde= ring if Flume is suitable for transporting potentially large documents (in = the most reliable mode, too) or if there is something inherent in Flume tha= t makes it a poor choice for this use case?=0A>>=0A>>=0A>>Thanks,=0A>>Otis= =A0=0A>>----=0A>>Performance Monitoring for Solr / ElasticSearch / HBase - = http://sematext.com/spm=A0=0A>>=0A>=0A>=0A> --1833503604-1920448510-1350357292=:54742 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Mike,
=

Thanks for the info!  Our docs, however, are = not quite 100MB - more like 5MB max and most of the time under 10KB.  = Would you still say Flume is not the right tool for the job?  If so, w= hat is the main concern?  Is it about the number of documents Flume wi= ll keep in memory at any one time and thus require a potentially large heap= and still risk OOMing?  Or is the main concern that writing such "lar= ge" documents to disk will be slow?

My= documents need to end up in Solr or ElasticSearch and maybe also in HDFS, = so I was hoping I could get ES and HDFS sinks from Flume for free.
Otis&= nbsp;
----
Performance Monitoring for Solr / ElasticSea= rch / HBase - http://sematext.com/spm 


From: Mike Percy <mpercy@apache.org>
To: user@flume.apache.org; Otis Gosp= odnetic <otis_gospodnetic@yahoo.com>
Sent: Monday, October 15, 2012 6:15 PM
Subject: Re: Flume for multi KB or M= B docs?

=0A
Hi Otis,
Flume was designe= d as a streaming event transport system, not as a general purpose file tran= sfer system. The two have quite different characteristics, so while binary = files could be transported by Flume, if you tried to transport a 100MB PDF = as a single event you may have issues around memory allocation, GC, transfe= r speed, etc., since we hold at least one event at a time in memory. Howeve= r if you want to transfer a large log file and each line is an event then i= t's a perfect use case because you care about the individual events more th= an the file itself.
=0A=0A

For transferring very la= rge binary files that are not events or records, you may want to look for s= omething that it good at being a single-hop system with resume capability, = like rsync, to transfer the files. Then I suppose you could use the hadoop = fs shell and a small script to store the data onto HDFS. You probably would= n't need all the fancy tagging, routing, and serialization features that Fl= ume has.
=0A=0A

Hope this helps.

Regards
Mike

On Sun, Oct 14, 2012 at 5:49 PM, Otis Gospodnetic <<= a rel=3D"nofollow" ymailto=3D"mailto:otis_gospodnetic@yahoo.com" target=3D"= _blank" href=3D"mailto:otis_gospodnetic@yahoo.com">otis_gospodnetic@yahoo.c= om> wrote:
=0A=0A
Hi,
=0A=0A
We're considering using Flume for transport of po= tentially large "documents" (think documents that can be as small as tweets= or as large as PDF files).
=0A=0A

=0A=0AI'm wondering if Flume is=0A suitable for transportin= g potentially large documents (in the most reliable mode, too) or if there = is something inherent in Flume that makes it a poor choice for this use cas= e?
=0A=0A
Thanks,
=0A=0AOtis 
=
----
Performance Monitoring for Solr / ElasticSearch / HBase - http= ://sematext.com/spm 
=0A=0A
=0A
=

--1833503604-1920448510-1350357292=:54742--