Return-Path: Delivered-To: apmail-hadoop-avro-user-archive@minotaur.apache.org Received: (qmail 98341 invoked from network); 23 Jan 2010 02:39:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Jan 2010 02:39:37 -0000 Received: (qmail 2185 invoked by uid 500); 23 Jan 2010 02:39:37 -0000 Delivered-To: apmail-hadoop-avro-user-archive@hadoop.apache.org Received: (qmail 2068 invoked by uid 500); 23 Jan 2010 02:39:36 -0000 Mailing-List: contact avro-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: avro-user@hadoop.apache.org Delivered-To: mailing list avro-user@hadoop.apache.org Received: (qmail 2057 invoked by uid 99); 23 Jan 2010 02:39:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Jan 2010 02:39:36 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.210.194] (HELO mail-yx0-f194.google.com) (209.85.210.194) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Jan 2010 02:39:29 +0000 Received: by yxe32 with SMTP id 32so3294294yxe.5 for ; Fri, 22 Jan 2010 18:39:08 -0800 (PST) MIME-Version: 1.0 Received: by 10.101.53.10 with SMTP id f10mr5028173ank.42.1264214348087; Fri, 22 Jan 2010 18:39:08 -0800 (PST) In-Reply-To: <34fd060d1001221820n33babef0qaf6e6089e077e06c@mail.gmail.com> References: <34fd060d1001221820n33babef0qaf6e6089e077e06c@mail.gmail.com> From: Philip Zeyliger Date: Fri, 22 Jan 2010 18:38:48 -0800 Message-ID: <15da8a101001221838r40bdf4ddna5c679df7b53d3fe@mail.gmail.com> Subject: Re: lazy deserialization? To: avro-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Not with any of today's APIs. "SELECT col1, col3 FROM t" is handled easily: you construct a schema that only has those columns, and col2 is skipped at read time. Does Hive have a use case for this that you're interested in? If you don't mind paying the buffer copy, you could probably write a "DeferredFoo" class that doesn't de-serialize certain structures... -- Philip On Fri, Jan 22, 2010 at 6:20 PM, Zheng Shao wrote: > I noticed that avro has the "skip" functions which can help skip a > field when deserializing data. > This is good for column pruning in most cases, but we might be able to > do better in the following case. > > > Let's say we have a query like this: > > CREATE TABLE t (col1 STRING, col2 STRING, col3 STRING); > SELECT col2 FROM t WHERE col3 = 'abcde'; > > We want to get field col3 first, if that matches what we want, then we > want to get to field col2. > > > Is there anyway to "remember" the current location of deserialization, > so that we can "resume" from that point? > > > -- > Yours, > Zheng >