Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E6D6E1063B for ; Mon, 30 Sep 2013 21:16:45 +0000 (UTC) Received: (qmail 39303 invoked by uid 500); 30 Sep 2013 21:16:45 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 38903 invoked by uid 500); 30 Sep 2013 21:16:41 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 38887 invoked by uid 99); 30 Sep 2013 21:16:40 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Sep 2013 21:16:40 +0000 Received: from localhost (HELO mail-pa0-f45.google.com) (127.0.0.1) (smtp-auth username cutting, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Sep 2013 21:16:40 +0000 Received: by mail-pa0-f45.google.com with SMTP id rd3so6399030pab.18 for ; Mon, 30 Sep 2013 14:16:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=JvxEjtXP1DH06RxHYvTb+Ft7XP5FH9W8l+OhBYDwb3g=; b=Er5tsfUAxP/6ubkpb7+SAFiaM1Xk/dTQ6e+LAEBaHhSSJersbsljFfSm6cblqKBQRw vtRbUsJwU/soiw50CHsytlr7XfyKAKuSbp0FVpONcX+Fo3Vm8+8ojwjfPOS8STP5dSTQ XgPPhbblW8Wi4+OhiVuFAntBswbgHUHn+p8eeXhABA85TaVbYAIJVeF6UGJaR8lUFX9h DbBDpOK+zlQSJOPLimDBGBZmCikybc8mwl/psUnMizgWZ3yCESbhpgV4lFzhbrouwL9K xfWPcpkMJ/6MV8yFMlNibqd6tbjM2oQBDC+oVykC+ERUGI8z6s42fifh3bDwQu/O+PFX MPJw== MIME-Version: 1.0 X-Received: by 10.68.234.73 with SMTP id uc9mr5193853pbc.142.1380575797020; Mon, 30 Sep 2013 14:16:37 -0700 (PDT) Received: by 10.70.126.104 with HTTP; Mon, 30 Sep 2013 14:16:36 -0700 (PDT) In-Reply-To: References: Date: Mon, 30 Sep 2013 14:16:36 -0700 Message-ID: Subject: Re: using get/setMeta() & seek From: Doug Cutting To: user@avro.apache.org Content-Type: text/plain; charset=UTF-8 It sounds like you wish to read files while they're still being written. Is that correct? If so, that's not always reliable, since entries in the file may be only partially written when you attempt to read it. One can handle such errors, back up and re-attempt reading after the file grows more, but this can become complex. Rather one might initially create files to a staging directory, then periodically close and move them to the directory where they are to be read. This works well since file renames are atomic in most filesystems. If this generates too many small files then a consolidation job can be run periodically to replace sets of small files with larger files containing their appended content. To answer your specific question: you can store arbitrary values in a file's metadata, but those values must be set before data is written to the file, since metadata is in the file header. So it wouldn't work to store the current end of file in the metadata, if that's what you were asking. Doug On Fri, Sep 27, 2013 at 4:54 PM, Alan Miller wrote: > Hi, > > Here's my scenario. > > One Hadoop job collects incoming Flume data and keeps appending > records to Avro files. Every 30 minutes the file just grows. Another > Hadoop job runs every hour and reads the above files. When this job > finishes I want to keep track of where in the file (offset) it left off so > that > the next iteration can immediately seek to that position. > > Can I use the DataFileWriter's setMeta(String key, long value) > method to update a meta field with the position and use the DataFileReader's > getMeta(String key, long value) & seek(long position) methods > to implement this? > > Is that reasonable? Currently I'm only using the Java API. > Are these methods implemented in the Ruby too? > > Thanks, > Alan