Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7E7BD9FB4 for ; Wed, 23 May 2012 17:28:38 +0000 (UTC) Received: (qmail 21126 invoked by uid 500); 23 May 2012 17:28:38 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 20991 invoked by uid 500); 23 May 2012 17:28:37 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 20983 invoked by uid 99); 23 May 2012 17:28:37 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 May 2012 17:28:37 +0000 Received: from localhost (HELO mail-lb0-f171.google.com) (127.0.0.1) (smtp-auth username cutting, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 May 2012 17:28:37 +0000 Received: by lbom4 with SMTP id m4so6161062lbo.30 for ; Wed, 23 May 2012 10:28:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.48.2 with SMTP id h2mr6972924lbn.61.1337794115562; Wed, 23 May 2012 10:28:35 -0700 (PDT) Received: by 10.112.85.225 with HTTP; Wed, 23 May 2012 10:28:35 -0700 (PDT) In-Reply-To: References: Date: Wed, 23 May 2012 10:28:35 -0700 Message-ID: Subject: Re: Can serialized Avro records be efficiently compared without deserializing? From: Doug Cutting To: user@avro.apache.org Content-Type: text/plain; charset=UTF-8 On Tue, May 22, 2012 at 1:22 PM, Jonathan Coveney wrote: > Imagine I use Avro to serialize an object (without loss of generality let's > say an array of longs). I'm curious if it is possible to compare those > arrays without deserializing... ie look at the bytes in memory or on disk, > and do the comparison based on those bytes (ie the raw comparison that > Hadoop does in the shuffle sort). > > I poked around the documentation but wasn't sure where to look. Yes, this is possible. The Java method that does this is BinaryData#compare(). http://avro.apache.org/docs/current/api/java/org/apache/avro/io/BinaryData.html#compare(byte[], int, byte[], int, org.apache.avro.Schema) Doug