Return-Path: Delivered-To: apmail-avro-user-archive@www.apache.org Received: (qmail 50394 invoked from network); 12 Oct 2010 06:05:14 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 12 Oct 2010 06:05:14 -0000 Received: (qmail 28074 invoked by uid 500); 12 Oct 2010 06:05:14 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 27862 invoked by uid 500); 12 Oct 2010 06:05:12 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 27854 invoked by uid 99); 12 Oct 2010 06:05:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Oct 2010 06:05:11 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of qwertymaniac@gmail.com designates 209.85.161.43 as permitted sender) Received: from [209.85.161.43] (HELO mail-fx0-f43.google.com) (209.85.161.43) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Oct 2010 06:05:06 +0000 Received: by fxm18 with SMTP id 18so1423298fxm.30 for ; Mon, 11 Oct 2010 23:04:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:received :in-reply-to:references:date:message-id:subject:from:to:content-type; bh=5o6y2TrgrCoTO9oUQnVxQKkyJ9/YDLkln4xhSDxCG84=; b=vyblumciKMxATofK3Dt3bPXKwa1H1Lgh5KN3vO9y6qVKPeY30twLZ1xDEIzhiFaJU3 WlPwP8tr9Ly38Gi+OR52SpjNAnUmuYRZ2yvIZrHgMjvd39uh2kP9XGorLqj8NTE4CXiK PYIg4rAc6SaCdv4O5vRlj+VTnGWKBixqDmDV0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=uB1IoUfo4Wtp5CHItbOd6MK+ChvSmIxBdeLTLoSPK9pZmiAqAyAwEErRXTLbH44zVZ 9i9PQ3Wk8i6vgUZyAWM8pa+ADt5nb8XeGCPJFsIRkwKHi09hINtGl/f53Kuci1ju1EJT klejG5FYzO8OdxKV6VxCPs22RSmdz+1qIG5Sw= MIME-Version: 1.0 Received: by 10.223.112.82 with SMTP id v18mr2488790fap.49.1286863485104; Mon, 11 Oct 2010 23:04:45 -0700 (PDT) Received: by 10.223.110.147 with HTTP; Mon, 11 Oct 2010 23:04:45 -0700 (PDT) Received: by 10.223.110.147 with HTTP; Mon, 11 Oct 2010 23:04:45 -0700 (PDT) In-Reply-To: <15644D02-D0F6-4DFC-ABDD-1E046B127909@internode.on.net> References: <15644D02-D0F6-4DFC-ABDD-1E046B127909@internode.on.net> Date: Tue, 12 Oct 2010 11:34:45 +0530 Message-ID: Subject: Re: Confusion re. persisting the schema From: Harsh J To: user@avro.apache.org Content-Type: multipart/alternative; boundary=001636d34b6ee7744e0492654015 --001636d34b6ee7744e0492654015 Content-Type: text/plain; charset=ISO-8859-1 You are simply writing encoded data with that code. You need to use o.a.a.file.DataFileWriter to write proper avro datafiles (by appending your datum to it), which stores schema in its headers among other features. On Oct 12, 2010 11:29 AM, "Christopher Hunt" wrote: Hi there, I've just noticed that when I write out my binary data I don't appear to have a schema saved with it. I was under the impression that Avro saves schemas along with the data. Thanks for any clarification. Here's my schema: { "name": "FileDependency", "type": "record", "fields": [ {"name": "file", "type": "string"}, {"name": "imports", "type": { "type": "array", "items": "string"} } ] } The code to write out my data is as follows (also appreciate any refinement suggestions as I'm new to Avro): @Cleanup InputStream fileDependencySchemaIs = this.getClass() .getResourceAsStream(FILE_DEPENDENCY_GRAPH_SCHEMA_NAME); Schema fileDependencySchema = Schema.parse(fileDependencySchemaIs); GenericDatumWriter genericDatumWriter = new GenericDatumWriter(fileDependencySchema); @Cleanup OutputStream os = new FileOutputStream(new File(workFolder, FILE_DEPENDENCY_GRAPH_NAME)); Encoder encoder = new BinaryEncoder(os); for (Map.Entry> entry : fileDependencies .entrySet()) { GenericRecord genericRecord = new GenericData.Record( fileDependencySchema); genericRecord.put("file", new Utf8(entry.getKey())); Set imports = entry.getValue(); GenericArray genericArray = new GenericData.Array( imports.size(), Schema.createArray(Schema.create(Type.STRING))); for (String importFile : imports) { genericArray.add(new Utf8(importFile)); } genericRecord.put("imports", genericArray); genericDatumWriter.write(genericRecord, encoder); } encoder.flush(); Thanks again. Kind regards, Christopher --001636d34b6ee7744e0492654015 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

You are simply writing encoded data with that code. You need to use o.a.= a.file.DataFileWriter to write proper avro datafiles (by appending your dat= um to it), which stores schema in its headers among other features.

On Oct 12, 2010 11:29 AM, "Christopher Hu= nt" <huntc@internode.on.n= et> wrote:

Hi there,
=
I've just noticed that when I write out my binary data I don= 't appear to have a schema saved with it. I was under the impression th= at Avro saves schemas along with the data. Thanks for any clarification.

Here's my schema:

{<= /div>
=A0 "name": "FileDependency",=A0
=A0 "t= ype": "record",
=A0 "fields": [
=A0 =A0 =A0 {"name": "f= ile", "type": "string"},
=A0 =A0 =A0 {"name": "imports", "type": {
=A0 =A0 =A0 =A0 =A0 "type": "array", "item= s": "string"}
=A0 =A0 =A0 }
=A0 =A0 ]
}

The code to write out my data is as follow= s (also appreciate any refinement suggestions as I'm new to Avro):

=A0=A0@Cleanup
=A0 InputStr= eam fileDependencySchemaIs =3D this.ge= tClass()
=A0 =A0 =A0 .getR= esourceAsStream(FILE_DEPENDENCY_GRAPH_SCHEMA_NAME);
=A0 Schema fileDependencySchema =3D Schema.parse(fileDependencySchema= Is);

=A0 GenericDatumWriter<GenericRecord> genericDatumWri= ter =3D=A0
=A0 =A0 =A0 new GenericDatumWriter<= GenericRecord>(fileDependencySchema);
=A0 @Cleanup
=A0 OutputStream os =3D new File= OutputStream(new File(workFolder,
=A0 =A0 =A0 FILE_DEPENDENCY_GRAPH_NAME= ));
=A0 Encoder encoder =3D= new BinaryEncoder(os);
=A0 for (Map.Entry<String, Se= t<String>> entry : fileDependencies<= /span>
=A0 =A0 =A0 .entrySet()) {

=A0 =A0 GenericRecord genericRecord =3D new GenericData.Record(
=A0 =A0 fileDependencySchema);

=A0 =A0 genericRecord.put(&qu= ot;file", new Utf8(entry.g= etKey()));

=A0 =A0 Set<String> imports =3D= entry.getValue();
=A0 =A0 GenericArray<Utf8> genericArray =3D new GenericData.Array<Utf8>(
=A0 =A0 =A0 =A0 imports.size(),=A0
=A0 =A0 =A0 =A0 Schema.create= Array(Schema.create(Type.STRING)));
=A0 =A0 for (String importFile : impor= ts) {
=A0 =A0 =A0 genericArray.add(new Utf8(importFile));
=A0 =A0 }
=A0 =A0 genericRecord.put("imports", genericArray);

=A0 =A0 genericDatumWriter.write(gene= ricRecord, encoder);
=A0 }
=A0 encoder.flush();

Thanks again.

Kind regards,
Christopher

--001636d34b6ee7744e0492654015--