Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BFEDB200C86 for ; Wed, 31 May 2017 17:48:48 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BE542160BBA; Wed, 31 May 2017 15:48:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 700FE160BCB for ; Wed, 31 May 2017 17:48:46 +0200 (CEST) Received: (qmail 16739 invoked by uid 500); 31 May 2017 15:48:45 -0000 Mailing-List: contact commits-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@avro.apache.org Delivered-To: mailing list commits@avro.apache.org Received: (qmail 16688 invoked by uid 99); 31 May 2017 15:48:45 -0000 Received: from Unknown (HELO svn01-us-west.apache.org) (209.188.14.144) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 May 2017 15:48:45 +0000 Received: from svn01-us-west.apache.org (localhost [127.0.0.1]) by svn01-us-west.apache.org (ASF Mail Server at svn01-us-west.apache.org) with ESMTP id 5F1823A039F for ; Wed, 31 May 2017 15:48:44 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1797063 [1/5] - in /avro/site/publish/docs/1.8.2: ./ examples/ examples/java-example/ examples/java-example/src/ examples/java-example/src/main/ examples/java-example/src/main/java/ examples/java-example/src/main/java/example/ examples/mr-... Date: Wed, 31 May 2017 15:48:44 -0000 To: commits@avro.apache.org From: suraj@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20170531154844.5F1823A039F@svn01-us-west.apache.org> archived-at: Wed, 31 May 2017 15:48:48 -0000 Author: suraj Date: Wed May 31 15:48:43 2017 New Revision: 1797063 URL: http://svn.apache.org/viewvc?rev=1797063&view=rev Log: Update the correct avro-docs Added: avro/site/publish/docs/1.8.2/ avro/site/publish/docs/1.8.2/broken-links.xml avro/site/publish/docs/1.8.2/examples/ avro/site/publish/docs/1.8.2/examples/example.py avro/site/publish/docs/1.8.2/examples/java-example/ avro/site/publish/docs/1.8.2/examples/java-example/pom.xml avro/site/publish/docs/1.8.2/examples/java-example/src/ avro/site/publish/docs/1.8.2/examples/java-example/src/main/ avro/site/publish/docs/1.8.2/examples/java-example/src/main/java/ avro/site/publish/docs/1.8.2/examples/java-example/src/main/java/example/ avro/site/publish/docs/1.8.2/examples/java-example/src/main/java/example/GenericMain.java avro/site/publish/docs/1.8.2/examples/java-example/src/main/java/example/SpecificMain.java avro/site/publish/docs/1.8.2/examples/mr-example/ avro/site/publish/docs/1.8.2/examples/mr-example/pom.xml avro/site/publish/docs/1.8.2/examples/mr-example/src/ avro/site/publish/docs/1.8.2/examples/mr-example/src/main/ avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/ avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/ avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/AvroWordCount.java avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/GenerateData.java avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapReduceAvroWordCount.java avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapReduceColorCount.java avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapredColorCount.java avro/site/publish/docs/1.8.2/examples/user.avsc avro/site/publish/docs/1.8.2/gettingstartedjava.html avro/site/publish/docs/1.8.2/gettingstartedjava.pdf (with props) avro/site/publish/docs/1.8.2/gettingstartedpython.html avro/site/publish/docs/1.8.2/gettingstartedpython.pdf (with props) avro/site/publish/docs/1.8.2/htmldocs/ avro/site/publish/docs/1.8.2/htmldocs/canonical-completeness.html avro/site/publish/docs/1.8.2/idl.html avro/site/publish/docs/1.8.2/idl.pdf (with props) avro/site/publish/docs/1.8.2/images/ avro/site/publish/docs/1.8.2/images/apache_feather.gif (with props) avro/site/publish/docs/1.8.2/images/avro-logo.png (with props) avro/site/publish/docs/1.8.2/images/built-with-forrest-button.png (with props) avro/site/publish/docs/1.8.2/images/favicon.ico (with props) avro/site/publish/docs/1.8.2/images/instruction_arrow.png (with props) avro/site/publish/docs/1.8.2/index.html avro/site/publish/docs/1.8.2/index.pdf (with props) avro/site/publish/docs/1.8.2/linkmap.html avro/site/publish/docs/1.8.2/linkmap.pdf (with props) avro/site/publish/docs/1.8.2/mr.html avro/site/publish/docs/1.8.2/mr.pdf (with props) avro/site/publish/docs/1.8.2/sasl.html avro/site/publish/docs/1.8.2/sasl.pdf (with props) avro/site/publish/docs/1.8.2/skin/ avro/site/publish/docs/1.8.2/skin/CommonMessages_de.xml avro/site/publish/docs/1.8.2/skin/CommonMessages_en_US.xml avro/site/publish/docs/1.8.2/skin/CommonMessages_es.xml avro/site/publish/docs/1.8.2/skin/CommonMessages_fr.xml avro/site/publish/docs/1.8.2/skin/basic.css avro/site/publish/docs/1.8.2/skin/breadcrumbs-optimized.js avro/site/publish/docs/1.8.2/skin/breadcrumbs.js avro/site/publish/docs/1.8.2/skin/css/ avro/site/publish/docs/1.8.2/skin/fontsize.js avro/site/publish/docs/1.8.2/skin/getBlank.js avro/site/publish/docs/1.8.2/skin/getMenu.js avro/site/publish/docs/1.8.2/skin/images/ avro/site/publish/docs/1.8.2/skin/images/README.txt avro/site/publish/docs/1.8.2/skin/images/add.jpg (with props) avro/site/publish/docs/1.8.2/skin/images/apache-thanks.png (with props) avro/site/publish/docs/1.8.2/skin/images/built-with-cocoon.gif (with props) avro/site/publish/docs/1.8.2/skin/images/built-with-forrest-button.png (with props) avro/site/publish/docs/1.8.2/skin/images/chapter.gif (with props) avro/site/publish/docs/1.8.2/skin/images/chapter_open.gif (with props) avro/site/publish/docs/1.8.2/skin/images/current.gif (with props) avro/site/publish/docs/1.8.2/skin/images/error.png (with props) avro/site/publish/docs/1.8.2/skin/images/external-link.gif (with props) avro/site/publish/docs/1.8.2/skin/images/fix.jpg (with props) avro/site/publish/docs/1.8.2/skin/images/forrest-credit-logo.png (with props) avro/site/publish/docs/1.8.2/skin/images/hack.jpg (with props) avro/site/publish/docs/1.8.2/skin/images/header_white_line.gif (with props) avro/site/publish/docs/1.8.2/skin/images/info.png (with props) avro/site/publish/docs/1.8.2/skin/images/instruction_arrow.png (with props) avro/site/publish/docs/1.8.2/skin/images/label.gif (with props) avro/site/publish/docs/1.8.2/skin/images/page.gif (with props) avro/site/publish/docs/1.8.2/skin/images/pdfdoc.gif (with props) avro/site/publish/docs/1.8.2/skin/images/poddoc.png (with props) avro/site/publish/docs/1.8.2/skin/images/printer.gif (with props) avro/site/publish/docs/1.8.2/skin/images/rc-b-l-15-1body-2menu-3menu.png (with props) avro/site/publish/docs/1.8.2/skin/images/rc-b-r-15-1body-2menu-3menu.png (with props) avro/site/publish/docs/1.8.2/skin/images/rc-b-r-5-1header-2tab-selected-3tab-selected.png (with props) avro/site/publish/docs/1.8.2/skin/images/rc-t-l-5-1header-2searchbox-3searchbox.png (with props) avro/site/publish/docs/1.8.2/skin/images/rc-t-l-5-1header-2tab-selected-3tab-selected.png (with props) avro/site/publish/docs/1.8.2/skin/images/rc-t-l-5-1header-2tab-unselected-3tab-unselected.png (with props) avro/site/publish/docs/1.8.2/skin/images/rc-t-r-15-1body-2menu-3menu.png (with props) avro/site/publish/docs/1.8.2/skin/images/rc-t-r-5-1header-2searchbox-3searchbox.png (with props) avro/site/publish/docs/1.8.2/skin/images/rc-t-r-5-1header-2tab-selected-3tab-selected.png (with props) avro/site/publish/docs/1.8.2/skin/images/rc-t-r-5-1header-2tab-unselected-3tab-unselected.png (with props) avro/site/publish/docs/1.8.2/skin/images/remove.jpg (with props) avro/site/publish/docs/1.8.2/skin/images/rss.png (with props) avro/site/publish/docs/1.8.2/skin/images/spacer.gif (with props) avro/site/publish/docs/1.8.2/skin/images/success.png (with props) avro/site/publish/docs/1.8.2/skin/images/txtdoc.png (with props) avro/site/publish/docs/1.8.2/skin/images/update.jpg (with props) avro/site/publish/docs/1.8.2/skin/images/valid-html401.png (with props) avro/site/publish/docs/1.8.2/skin/images/vcss.png (with props) avro/site/publish/docs/1.8.2/skin/images/warning.png (with props) avro/site/publish/docs/1.8.2/skin/images/xmldoc.gif (with props) avro/site/publish/docs/1.8.2/skin/menu.js avro/site/publish/docs/1.8.2/skin/note.txt avro/site/publish/docs/1.8.2/skin/print.css avro/site/publish/docs/1.8.2/skin/profile.css avro/site/publish/docs/1.8.2/skin/prototype.js avro/site/publish/docs/1.8.2/skin/screen.css avro/site/publish/docs/1.8.2/skin/scripts/ avro/site/publish/docs/1.8.2/skin/translations/ avro/site/publish/docs/1.8.2/spec.html avro/site/publish/docs/1.8.2/spec.pdf (with props) Added: avro/site/publish/docs/1.8.2/broken-links.xml URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/broken-links.xml?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/broken-links.xml (added) +++ avro/site/publish/docs/1.8.2/broken-links.xml Wed May 31 15:48:43 2017 @@ -0,0 +1,2 @@ + + Added: avro/site/publish/docs/1.8.2/examples/example.py URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/examples/example.py?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/examples/example.py (added) +++ avro/site/publish/docs/1.8.2/examples/example.py Wed May 31 15:48:43 2017 @@ -0,0 +1,33 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +import avro.schema +from avro.datafile import DataFileReader, DataFileWriter +from avro.io import DatumReader, DatumWriter + +schema = avro.schema.parse(open("user.avsc").read()) + +writer = DataFileWriter(open("/tmp/users.avro", "w"), DatumWriter(), schema) +writer.append({"name": "Alyssa", "favorite_number": 256, "WTF": 2}) +writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"}) +writer.close() + +reader = DataFileReader(open("/tmp/users.avro", "r"), DatumReader()) +for user in reader: + print user +reader.close() Added: avro/site/publish/docs/1.8.2/examples/java-example/pom.xml URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/examples/java-example/pom.xml?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/examples/java-example/pom.xml (added) +++ avro/site/publish/docs/1.8.2/examples/java-example/pom.xml Wed May 31 15:48:43 2017 @@ -0,0 +1,70 @@ + + + 4.0.0 + example + java-example + jar + 1.0-SNAPSHOT + java-example + http://maven.apache.org + + + junit + junit + 3.8.1 + test + + + org.apache.avro + avro + 1.7.5 + + + + + + org.apache.avro + avro-maven-plugin + 1.7.5 + + + generate-sources + + schema + + + ${project.basedir}/../ + ${project.basedir}/src/main/java/ + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 1.6 + 1.6 + + + + + Added: avro/site/publish/docs/1.8.2/examples/java-example/src/main/java/example/GenericMain.java URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/examples/java-example/src/main/java/example/GenericMain.java?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/examples/java-example/src/main/java/example/GenericMain.java (added) +++ avro/site/publish/docs/1.8.2/examples/java-example/src/main/java/example/GenericMain.java Wed May 31 15:48:43 2017 @@ -0,0 +1,71 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package example; + +import java.io.File; +import java.io.IOException; + +import org.apache.avro.Schema; +import org.apache.avro.Schema.Parser; +import org.apache.avro.file.DataFileReader; +import org.apache.avro.file.DataFileWriter; +import org.apache.avro.generic.GenericData; +import org.apache.avro.generic.GenericDatumReader; +import org.apache.avro.generic.GenericDatumWriter; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.io.DatumReader; +import org.apache.avro.io.DatumWriter; + +public class GenericMain { + public static void main(String[] args) throws IOException { + Schema schema = new Parser().parse(new File("/home/skye/code/cloudera/avro/doc/examples/user.avsc")); + + GenericRecord user1 = new GenericData.Record(schema); + user1.put("name", "Alyssa"); + user1.put("favorite_number", 256); + // Leave favorite color null + + GenericRecord user2 = new GenericData.Record(schema); + user2.put("name", "Ben"); + user2.put("favorite_number", 7); + user2.put("favorite_color", "red"); + + // Serialize user1 and user2 to disk + File file = new File("users.avro"); + DatumWriter datumWriter = new GenericDatumWriter(schema); + DataFileWriter dataFileWriter = new DataFileWriter(datumWriter); + dataFileWriter.create(schema, file); + dataFileWriter.append(user1); + dataFileWriter.append(user2); + dataFileWriter.close(); + + // Deserialize users from disk + DatumReader datumReader = new GenericDatumReader(schema); + DataFileReader dataFileReader = new DataFileReader(file, datumReader); + GenericRecord user = null; + while (dataFileReader.hasNext()) { + // Reuse user object by passing it to next(). This saves us from + // allocating and garbage collecting many objects for files with + // many items. + user = dataFileReader.next(user); + System.out.println(user); + } + + } +} Added: avro/site/publish/docs/1.8.2/examples/java-example/src/main/java/example/SpecificMain.java URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/examples/java-example/src/main/java/example/SpecificMain.java?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/examples/java-example/src/main/java/example/SpecificMain.java (added) +++ avro/site/publish/docs/1.8.2/examples/java-example/src/main/java/example/SpecificMain.java Wed May 31 15:48:43 2017 @@ -0,0 +1,73 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package example; + +import java.io.File; +import java.io.IOException; + +import org.apache.avro.file.DataFileReader; +import org.apache.avro.file.DataFileWriter; +import org.apache.avro.io.DatumReader; +import org.apache.avro.io.DatumWriter; +import org.apache.avro.specific.SpecificDatumReader; +import org.apache.avro.specific.SpecificDatumWriter; + +import example.avro.User; + +public class SpecificMain { + public static void main(String[] args) throws IOException { + User user1 = new User(); + user1.setName("Alyssa"); + user1.setFavoriteNumber(256); + // Leave favorite color null + + // Alternate constructor + User user2 = new User("Ben", 7, "red"); + + // Construct via builder + User user3 = User.newBuilder() + .setName("Charlie") + .setFavoriteColor("blue") + .setFavoriteNumber(null) + .build(); + + // Serialize user1 and user2 to disk + File file = new File("users.avro"); + DatumWriter userDatumWriter = new SpecificDatumWriter(User.class); + DataFileWriter dataFileWriter = new DataFileWriter(userDatumWriter); + dataFileWriter.create(user1.getSchema(), file); + dataFileWriter.append(user1); + dataFileWriter.append(user2); + dataFileWriter.append(user3); + dataFileWriter.close(); + + // Deserialize Users from disk + DatumReader userDatumReader = new SpecificDatumReader(User.class); + DataFileReader dataFileReader = new DataFileReader(file, userDatumReader); + User user = null; + while (dataFileReader.hasNext()) { + // Reuse user object by passing it to next(). This saves us from + // allocating and garbage collecting many objects for files with + // many items. + user = dataFileReader.next(user); + System.out.println(user); + } + + } +} Added: avro/site/publish/docs/1.8.2/examples/mr-example/pom.xml URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/examples/mr-example/pom.xml?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/examples/mr-example/pom.xml (added) +++ avro/site/publish/docs/1.8.2/examples/mr-example/pom.xml Wed May 31 15:48:43 2017 @@ -0,0 +1,77 @@ + + + 4.0.0 + + example + mr-example + 1.0 + jar + + mr-example + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 1.6 + 1.6 + + + + org.apache.avro + avro-maven-plugin + 1.7.5 + + + generate-sources + + schema + + + ${project.basedir}/../ + ${project.build.directory}/generated-sources/java + + + + + + + + + + org.apache.avro + avro + 1.7.5 + + + org.apache.avro + avro-mapred + 1.7.5 + + + org.apache.hadoop + hadoop-core + 1.1.0 + + + Added: avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/AvroWordCount.java URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/AvroWordCount.java?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/AvroWordCount.java (added) +++ avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/AvroWordCount.java Wed May 31 15:48:43 2017 @@ -0,0 +1,105 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package example; + +import java.io.IOException; +import java.util.*; + +import org.apache.avro.*; +import org.apache.avro.Schema.Type; +import org.apache.avro.mapred.*; +import org.apache.hadoop.conf.*; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.*; +import org.apache.hadoop.mapred.*; +import org.apache.hadoop.util.*; + +/** + * The classic WordCount example modified to output Avro Pair records instead of text. + */ +public class AvroWordCount extends Configured implements Tool { + + public static class Map extends MapReduceBase implements Mapper { + private final static IntWritable one = new IntWritable(1); + private Text word = new Text(); + + public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) + throws IOException { + String line = value.toString(); + StringTokenizer tokenizer = new StringTokenizer(line); + while (tokenizer.hasMoreTokens()) { + word.set(tokenizer.nextToken()); + output.collect(word, one); + } + } + } + + public static class Reduce extends MapReduceBase + implements Reducer>, NullWritable> { + + public void reduce(Text key, Iterator values, + OutputCollector>, NullWritable> output, + Reporter reporter) throws IOException { + int sum = 0; + while (values.hasNext()) { + sum += values.next().get(); + } + output.collect(new AvroWrapper>( + new Pair(key.toString(), sum)), + NullWritable.get()); + } + } + + public int run(String[] args) throws Exception { + if (args.length != 2) { + System.err.println("Usage: AvroWordCount "); + return -1; + } + + JobConf conf = new JobConf(AvroWordCount.class); + conf.setJobName("wordcount"); + + // We call setOutputSchema first so we can override the configuration + // parameters it sets + AvroJob.setOutputSchema(conf, Pair.getPairSchema(Schema.create(Type.STRING), + Schema.create(Type.INT))); + + conf.setMapperClass(Map.class); + conf.setReducerClass(Reduce.class); + + conf.setInputFormat(TextInputFormat.class); + + conf.setMapOutputKeyClass(Text.class); + conf.setMapOutputValueClass(IntWritable.class); + conf.setOutputKeyComparatorClass(Text.Comparator.class); + + FileInputFormat.setInputPaths(conf, new Path(args[0])); + FileOutputFormat.setOutputPath(conf, new Path(args[1])); + + JobClient.runJob(conf); + return 0; + } + + public static void main(String[] args) throws Exception { + int res = ToolRunner.run(new Configuration(), new AvroWordCount(), args); + System.exit(res); + } +} Added: avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/GenerateData.java URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/GenerateData.java?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/GenerateData.java (added) +++ avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/GenerateData.java Wed May 31 15:48:43 2017 @@ -0,0 +1,57 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package example; + +import java.io.File; +import java.io.IOException; +import java.util.Random; + +import org.apache.avro.file.DataFileWriter; +import org.apache.avro.io.DatumWriter; +import org.apache.avro.specific.SpecificDatumWriter; + +import example.avro.User; + +public class GenerateData { + public static final String[] COLORS = {"red", "orange", "yellow", "green", "blue", "purple", null}; + public static final int USERS = 20; + public static final String PATH = "./input/users.avro"; + + public static void main(String[] args) throws IOException { + // Open data file + File file = new File(PATH); + if (file.getParentFile() != null) { + file.getParentFile().mkdirs(); + } + DatumWriter userDatumWriter = new SpecificDatumWriter(User.class); + DataFileWriter dataFileWriter = new DataFileWriter(userDatumWriter); + dataFileWriter.create(User.SCHEMA$, file); + + // Create random users + User user; + Random random = new Random(); + for (int i = 0; i < USERS; i++) { + user = new User("user", null, COLORS[random.nextInt(COLORS.length)]); + dataFileWriter.append(user); + System.out.println(user); + } + + dataFileWriter.close(); + } +} Added: avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapReduceAvroWordCount.java URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapReduceAvroWordCount.java?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapReduceAvroWordCount.java (added) +++ avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapReduceAvroWordCount.java Wed May 31 15:48:43 2017 @@ -0,0 +1,124 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package example; + +import java.io.IOException; +import java.util.*; + +import org.apache.avro.Schema; +import org.apache.avro.Schema.Type; +import org.apache.avro.mapred.AvroWrapper; +import org.apache.avro.mapred.Pair; +import org.apache.avro.mapreduce.AvroJob; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.conf.Configured; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.IntWritable; +import org.apache.hadoop.io.LongWritable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.Text; +import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.Mapper; +import org.apache.hadoop.mapreduce.Reducer; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; +import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; +import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; +import org.apache.hadoop.util.Tool; +import org.apache.hadoop.util.ToolRunner; + +/** + * The classic WordCount example modified to output Avro Pair records instead of text. + */ +public class MapReduceAvroWordCount extends Configured implements Tool { + + public static class Map + extends Mapper { + + private final static IntWritable one = new IntWritable(1); + private Text word = new Text(); + + public void map(LongWritable key, Text value, Context context) + throws IOException, InterruptedException { + String line = value.toString(); + StringTokenizer tokenizer = new StringTokenizer(line); + while (tokenizer.hasMoreTokens()) { + word.set(tokenizer.nextToken()); + context.write(word, one); + } + } + } + + public static class Reduce + extends Reducer>, NullWritable> { + + public void reduce(Text key, Iterable values, + Context context) + throws IOException, InterruptedException { + int sum = 0; + for (IntWritable value : values) { + sum += value.get(); + } + context.write(new AvroWrapper> + (new Pair(key.toString(), sum)), + NullWritable.get()); + } + } + + public int run(String[] args) throws Exception { + if (args.length != 2) { + System.err.println("Usage: AvroWordCount "); + return -1; + } + + Job job = new Job(getConf()); + job.setJarByClass(MapReduceAvroWordCount.class); + job.setJobName("wordcount"); + + // We call setOutputSchema first so we can override the configuration + // parameters it sets + AvroJob.setOutputKeySchema(job, + Pair.getPairSchema(Schema.create(Type.STRING), + Schema.create(Type.INT))); + job.setOutputValueClass(NullWritable.class); + + job.setMapperClass(Map.class); + job.setReducerClass(Reduce.class); + + job.setInputFormatClass(TextInputFormat.class); + + job.setMapOutputKeyClass(Text.class); + job.setMapOutputValueClass(IntWritable.class); + job.setSortComparatorClass(Text.Comparator.class); + + FileInputFormat.setInputPaths(job, new Path(args[0])); + FileOutputFormat.setOutputPath(job, new Path(args[1])); + + job.waitForCompletion(true); + + return 0; + } + + public static void main(String[] args) throws Exception { + int res = + ToolRunner.run(new Configuration(), new MapReduceAvroWordCount(), args); + System.exit(res); + } +} Added: avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapReduceColorCount.java URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapReduceColorCount.java?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapReduceColorCount.java (added) +++ avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapReduceColorCount.java Wed May 31 15:48:43 2017 @@ -0,0 +1,107 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package example; + +import java.io.IOException; + +import org.apache.avro.Schema; +import org.apache.avro.mapred.AvroKey; +import org.apache.avro.mapred.AvroValue; +import org.apache.avro.mapreduce.AvroJob; +import org.apache.avro.mapreduce.AvroKeyInputFormat; +import org.apache.avro.mapreduce.AvroKeyValueOutputFormat; +import org.apache.hadoop.conf.Configured; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.IntWritable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.Text; +import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.Mapper; +import org.apache.hadoop.mapreduce.Reducer; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; +import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; +import org.apache.hadoop.util.Tool; +import org.apache.hadoop.util.ToolRunner; + +import example.avro.User; + +public class MapReduceColorCount extends Configured implements Tool { + + public static class ColorCountMapper extends + Mapper, NullWritable, Text, IntWritable> { + + @Override + public void map(AvroKey key, NullWritable value, Context context) + throws IOException, InterruptedException { + + CharSequence color = key.datum().getFavoriteColor(); + if (color == null) { + color = "none"; + } + context.write(new Text(color.toString()), new IntWritable(1)); + } + } + + public static class ColorCountReducer extends + Reducer, AvroValue> { + + @Override + public void reduce(Text key, Iterable values, + Context context) throws IOException, InterruptedException { + + int sum = 0; + for (IntWritable value : values) { + sum += value.get(); + } + context.write(new AvroKey(key.toString()), new AvroValue(sum)); + } + } + + public int run(String[] args) throws Exception { + if (args.length != 2) { + System.err.println("Usage: MapReduceColorCount "); + return -1; + } + + Job job = new Job(getConf()); + job.setJarByClass(MapReduceColorCount.class); + job.setJobName("Color Count"); + + FileInputFormat.setInputPaths(job, new Path(args[0])); + FileOutputFormat.setOutputPath(job, new Path(args[1])); + + job.setInputFormatClass(AvroKeyInputFormat.class); + job.setMapperClass(ColorCountMapper.class); + AvroJob.setInputKeySchema(job, User.getClassSchema()); + job.setMapOutputKeyClass(Text.class); + job.setMapOutputValueClass(IntWritable.class); + + job.setOutputFormatClass(AvroKeyValueOutputFormat.class); + job.setReducerClass(ColorCountReducer.class); + AvroJob.setOutputKeySchema(job, Schema.create(Schema.Type.STRING)); + AvroJob.setOutputValueSchema(job, Schema.create(Schema.Type.INT)); + + return (job.waitForCompletion(true) ? 0 : 1); + } + + public static void main(String[] args) throws Exception { + int res = ToolRunner.run(new MapReduceColorCount(), args); + System.exit(res); + } +} Added: avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapredColorCount.java URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapredColorCount.java?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapredColorCount.java (added) +++ avro/site/publish/docs/1.8.2/examples/mr-example/src/main/java/example/MapredColorCount.java Wed May 31 15:48:43 2017 @@ -0,0 +1,93 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package example; + +import java.io.IOException; + +import org.apache.avro.*; +import org.apache.avro.Schema.Type; +import org.apache.avro.mapred.*; +import org.apache.hadoop.conf.*; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapred.*; +import org.apache.hadoop.util.*; + +import example.avro.User; + +public class MapredColorCount extends Configured implements Tool { + + public static class ColorCountMapper extends AvroMapper> { + @Override + public void map(User user, AvroCollector> collector, Reporter reporter) + throws IOException { + CharSequence color = user.getFavoriteColor(); + // We need this check because the User.favorite_color field has type ["string", "null"] + if (color == null) { + color = "none"; + } + collector.collect(new Pair(color, 1)); + } + } + + public static class ColorCountReducer extends AvroReducer> { + @Override + public void reduce(CharSequence key, Iterable values, + AvroCollector> collector, + Reporter reporter) + throws IOException { + int sum = 0; + for (Integer value : values) { + sum += value; + } + collector.collect(new Pair(key, sum)); + } + } + + public int run(String[] args) throws Exception { + if (args.length != 2) { + System.err.println("Usage: MapredColorCount "); + return -1; + } + + JobConf conf = new JobConf(getConf(), MapredColorCount.class); + conf.setJobName("colorcount"); + + FileInputFormat.setInputPaths(conf, new Path(args[0])); + FileOutputFormat.setOutputPath(conf, new Path(args[1])); + + AvroJob.setMapperClass(conf, ColorCountMapper.class); + AvroJob.setReducerClass(conf, ColorCountReducer.class); + + // Note that AvroJob.setInputSchema and AvroJob.setOutputSchema set + // relevant config options such as input/output format, map output + // classes, and output key class. + AvroJob.setInputSchema(conf, User.getClassSchema()); + AvroJob.setOutputSchema(conf, Pair.getPairSchema(Schema.create(Type.STRING), + Schema.create(Type.INT))); + + JobClient.runJob(conf); + return 0; + } + + public static void main(String[] args) throws Exception { + int res = ToolRunner.run(new Configuration(), new MapredColorCount(), args); + System.exit(res); + } +} Added: avro/site/publish/docs/1.8.2/examples/user.avsc URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/examples/user.avsc?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/examples/user.avsc (added) +++ avro/site/publish/docs/1.8.2/examples/user.avsc Wed May 31 15:48:43 2017 @@ -0,0 +1,9 @@ +{"namespace": "example.avro", + "type": "record", + "name": "User", + "fields": [ + {"name": "name", "type": "string"}, + {"name": "favorite_number", "type": ["int", "null"]}, + {"name": "favorite_color", "type": ["string", "null"]} + ] +} Added: avro/site/publish/docs/1.8.2/gettingstartedjava.html URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/gettingstartedjava.html?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/gettingstartedjava.html (added) +++ avro/site/publish/docs/1.8.2/gettingstartedjava.html Wed May 31 15:48:43 2017 @@ -0,0 +1,694 @@ + + + + + + + +Apache Avro™ 1.8.2 + Getting Started (Java) + + + + + + + + + +
+ +
+Apache > Avro > Avro +
+ +
+ + + + + + + + + + + + +
+
+
+
+ +
+ + +
+ +
+ +   +
+ + + + + +
+ +

Apache Avro™ 1.8.2 + Getting Started (Java)

+ + +

+ This is a short guide for getting started with Apache Avro™ using + Java. This guide only covers using Avro for data serialization; see + Patrick Hunt's Avro + RPC Quick Start for a good introduction to using Avro for RPC. +

+ + +

Download

+
+

+ Avro implementations for C, C++, C#, Java, PHP, Python, and Ruby can be + downloaded from the Apache Avro™ + Releases page. This guide uses Avro 1.8.2 +, the latest + version at the time of writing. For the examples in this guide, + download avro-1.8.2 +.jar and + avro-tools-1.8.2 +.jar. The Avro Java implementation + also depends on the Jackson + JSON library. From the Jackson download page, + download the core-asl and mapper-asl jars. Add + avro-1.8.2 +.jar and the Jackson jars to your project's + classpath (avro-tools will be used for code generation). +

+

+ Alternatively, if you are using Maven, add the following dependency to + your POM: +

+
+<dependency>
+  <groupId>org.apache.avro</groupId>
+  <artifactId>avro</artifactId>
+  <version>1.8.2
+</version>
+</dependency>
+      
+

+ As well as the Avro Maven plugin (for performing code generation): +

+
+<plugin>
+  <groupId>org.apache.avro</groupId>
+  <artifactId>avro-maven-plugin</artifactId>
+  <version>1.8.2
+</version>
+  <executions>
+    <execution>
+      <phase>generate-sources</phase>
+      <goals>
+        <goal>schema</goal>
+      </goals>
+      <configuration>
+        <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
+        <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
+      </configuration>
+    </execution>
+  </executions>
+</plugin>
+<plugin>
+  <groupId>org.apache.maven.plugins</groupId>
+  <artifactId>maven-compiler-plugin</artifactId>
+  <configuration>
+    <source>1.6</source>
+    <target>1.6</target>
+  </configuration>
+</plugin>
+      
+

+ You may also build the required Avro jars from source. Building Avro is + beyond the scope of this guide; see the Build + Documentation page in the wiki for more information. +

+
+ + + +

Defining a schema

+
+

+ Avro schemas are defined using JSON. Schemas are composed of primitive types + (null, boolean, int, + long, float, double, + bytes, and string) and complex types (record, + enum, array, map, + union, and fixed). You can learn more about + Avro schemas and types from the specification, but for now let's start + with a simple schema example, user.avsc: +

+
+{"namespace": "example.avro",
+ "type": "record",
+ "name": "User",
+ "fields": [
+     {"name": "name", "type": "string"},
+     {"name": "favorite_number",  "type": ["int", "null"]},
+     {"name": "favorite_color", "type": ["string", "null"]}
+ ]
+}
+      
+

+ This schema defines a record representing a hypothetical user. (Note + that a schema file can only contain a single schema definition.) At + minimum, a record definition must include its type ("type": + "record"), a name ("name": "User"), and fields, in + this case name, favorite_number, and + favorite_color. We also define a namespace + ("namespace": "example.avro"), which together with the name + attribute defines the "full name" of the schema + (example.avro.User in this case). + +

+

+ Fields are defined via an array of objects, each of which defines a name + and type (other attributes are optional, see the record specification for more + details). The type attribute of a field is another schema object, which + can be either a primitive or complex type. For example, the + name field of our User schema is the primitive type + string, whereas the favorite_number and + favorite_color fields are both unions, + represented by JSON arrays. unions are a complex type that + can be any of the types listed in the array; e.g., + favorite_number can either be an int or + null, essentially making it an optional field. +

+
+ + + +

Serializing and deserializing with code generation

+
+ +

Compiling the schema

+

+ Code generation allows us to automatically create classes based on our + previously-defined schema. Once we have defined the relevant classes, + there is no need to use the schema directly in our programs. We use the + avro-tools jar to generate code as follows: +

+
+java -jar /path/to/avro-tools-1.8.2
+.jar compile schema <schema file> <destination>
+        
+

+ This will generate the appropriate source files in a package based on + the schema's namespace in the provided destination folder. For + instance, to generate a User class in package + example.avro from the schema defined above, run +

+
+java -jar /path/to/avro-tools-1.8.2
+.jar compile schema user.avsc .
+        
+

+ Note that if you using the Avro Maven plugin, there is no need to + manually invoke the schema compiler; the plugin automatically + performs code generation on any .avsc files present in the configured + source directory. +

+ +

Creating Users

+

+ Now that we've completed the code generation, let's create some + Users, serialize them to a data file on disk, and then + read back the file and deserialize the User objects. +

+

+ First let's create some Users and set their fields. +

+
+User user1 = new User();
+user1.setName("Alyssa");
+user1.setFavoriteNumber(256);
+// Leave favorite color null
+
+// Alternate constructor
+User user2 = new User("Ben", 7, "red");
+
+// Construct via builder
+User user3 = User.newBuilder()
+             .setName("Charlie")
+             .setFavoriteColor("blue")
+             .setFavoriteNumber(null)
+             .build();
+        
+

+ As shown in this example, Avro objects can be created either by + invoking a constructor directly or by using a builder. Unlike + constructors, builders will automatically set any default values + specified in the schema. Additionally, builders validate the data as + it set, whereas objects constructed directly will not cause an error + until the object is serialized. However, using constructors directly + generally offers better performance, as builders create a copy of the + datastructure before it is written. +

+

+ Note that we do not set user1's favorite color. Since + that record is of type ["string", "null"], we can either + set it to a string or leave it null; it is + essentially optional. Similarly, we set user3's favorite + number to null (using a builder requires setting all fields, even if + they are null). +

+ +

Serializing

+

+ Now let's serialize our Users to disk. +

+
+// Serialize user1, user2 and user3 to disk
+DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);
+DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter);
+dataFileWriter.create(user1.getSchema(), new File("users.avro"));
+dataFileWriter.append(user1);
+dataFileWriter.append(user2);
+dataFileWriter.append(user3);
+dataFileWriter.close();
+      
+

+ We create a DatumWriter, which converts Java objects into + an in-memory serialized format. The SpecificDatumWriter + class is used with generated classes and extracts the schema from the + specified generated type. +

+

+ Next we create a DataFileWriter, which writes the + serialized records, as well as the schema, to the file specified in the + dataFileWriter.create call. We write our users to the file + via calls to the dataFileWriter.append method. When we are + done writing, we close the data file. +

+ +

Deserializing

+

+ Finally, let's deserialize the data file we just created. +

+
+// Deserialize Users from disk
+DatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class);
+DataFileReader<User> dataFileReader = new DataFileReader<User>(file, userDatumReader);
+User user = null;
+while (dataFileReader.hasNext()) {
+// Reuse user object by passing it to next(). This saves us from
+// allocating and garbage collecting many objects for files with
+// many items.
+user = dataFileReader.next(user);
+System.out.println(user);
+}
+        
+

+ This snippet will output: +

+
+{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
+{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
+{"name": "Charlie", "favorite_number": null, "favorite_color": "blue"}
+        
+

+ Deserializing is very similar to serializing. We create a + SpecificDatumReader, analogous to the + SpecificDatumWriter we used in serialization, which + converts in-memory serialized items into instances of our generated + class, in this case User. We pass the + DatumReader and the previously created File + to a DataFileReader, analogous to the + DataFileWriter, which reads the data file on disk. +

+

+ Next we use the DataFileReader to iterate through the + serialized Users and print the deserialized object to + stdout. Note how we perform the iteration: we create a single + User object which we store the current deserialized user + in, and pass this record object to every call of + dataFileReader.next. This is a performance optimization + that allows the DataFileReader to reuse the same + User object rather than allocating a new + User for every iteration, which can be very expensive in + terms of object allocation and garbage collection if we deserialize a + large data file. While this technique is the standard way to iterate + through a data file, it's also possible to use for (User user : + dataFileReader) if performance is not a concern. +

+ +

Compiling and running the example code

+

+ This example code is included as a Maven project in the + examples/java-example directory in the Avro docs. From this + directory, execute the following commands to build and run the + example: +

+
+$ mvn compile # includes code generation via Avro Maven plugin
+$ mvn -q exec:java -Dexec.mainClass=example.SpecificMain
+        
+
+ + + +

Serializing and deserializing without code generation

+
+

+ Data in Avro is always stored with its corresponding schema, meaning we + can always read a serialized item regardless of whether we know the + schema ahead of time. This allows us to perform serialization and + deserialization without code generation. +

+

+ Let's go over the same example as in the previous section, but without + using code generation: we'll create some users, serialize them to a data + file on disk, and then read back the file and deserialize the users + objects. +

+ +

Creating users

+

+ First, we use a Parser to read our schema definition and + create a Schema object. +

+
+Schema schema = new Schema.Parser().parse(new File("user.avsc"));
+        
+

+ Using this schema, let's create some users. +

+
+GenericRecord user1 = new GenericData.Record(schema);
+user1.put("name", "Alyssa");
+user1.put("favorite_number", 256);
+// Leave favorite color null
+
+GenericRecord user2 = new GenericData.Record(schema);
+user2.put("name", "Ben");
+user2.put("favorite_number", 7);
+user2.put("favorite_color", "red");
+        
+

+ Since we're not using code generation, we use + GenericRecords to represent users. + GenericRecord uses the schema to verify that we only + specify valid fields. If we try to set a non-existent field (e.g., + user1.put("favorite_animal", "cat")), we'll get an + AvroRuntimeException when we run the program. +

+

+ Note that we do not set user1's favorite color. Since + that record is of type ["string", "null"], we can either + set it to a string or leave it null; it is + essentially optional. +

+ +

Serializing

+

+ Now that we've created our user objects, serializing and deserializing + them is almost identical to the example above which uses code + generation. The main difference is that we use generic instead of + specific readers and writers. +

+

+ First we'll serialize our users to a data file on disk. +

+
+// Serialize user1 and user2 to disk
+File file = new File("users.avro");
+DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema);
+DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter);
+dataFileWriter.create(schema, file);
+dataFileWriter.append(user1);
+dataFileWriter.append(user2);
+dataFileWriter.close();
+        
+

+ We create a DatumWriter, which converts Java objects into + an in-memory serialized format. Since we are not using code + generation, we create a GenericDatumWriter. It requires + the schema both to determine how to write the + GenericRecords and to verify that all non-nullable fields + are present. +

+

+ As in the code generation example, we also create a + DataFileWriter, which writes the serialized records, as + well as the schema, to the file specified in the + dataFileWriter.create call. We write our users to the + file via calls to the dataFileWriter.append method. When + we are done writing, we close the data file. +

+ +

Deserializing

+

+ Finally, we'll deserialize the data file we just created. +

+
+// Deserialize users from disk
+DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(schema);
+DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, datumReader);
+GenericRecord user = null;
+while (dataFileReader.hasNext()) {
+// Reuse user object by passing it to next(). This saves us from
+// allocating and garbage collecting many objects for files with
+// many items.
+user = dataFileReader.next(user);
+System.out.println(user);
+        
+

This outputs:

+
+{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
+{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
+        
+

+ Deserializing is very similar to serializing. We create a + GenericDatumReader, analogous to the + GenericDatumWriter we used in serialization, which + converts in-memory serialized items into GenericRecords. + We pass the DatumReader and the previously created + File to a DataFileReader, analogous to the + DataFileWriter, which reads the data file on disk. +

+

+ Next, we use the DataFileReader to iterate through the + serialized users and print the deserialized object to stdout. Note + how we perform the iteration: we create a single + GenericRecord object which we store the current + deserialized user in, and pass this record object to every call of + dataFileReader.next. This is a performance optimization + that allows the DataFileReader to reuse the same record + object rather than allocating a new GenericRecord for + every iteration, which can be very expensive in terms of object + allocation and garbage collection if we deserialize a large data file. + While this technique is the standard way to iterate through a data + file, it's also possible to use for (GenericRecord user : + dataFileReader) if performance is not a concern. +

+ +

Compiling and running the example code

+

+ This example code is included as a Maven project in the + examples/java-example directory in the Avro docs. From this + directory, execute the following commands to build and run the + example: +

+
+$ mvn compile
+$ mvn -q exec:java -Dexec.mainClass=example.GenericMain
+        
+
+ +
+ +
 
+
+ + + Added: avro/site/publish/docs/1.8.2/gettingstartedjava.pdf URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/gettingstartedjava.pdf?rev=1797063&view=auto ============================================================================== Binary file - no diff available. Propchange: avro/site/publish/docs/1.8.2/gettingstartedjava.pdf ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: avro/site/publish/docs/1.8.2/gettingstartedpython.html URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/gettingstartedpython.html?rev=1797063&view=auto ============================================================================== --- avro/site/publish/docs/1.8.2/gettingstartedpython.html (added) +++ avro/site/publish/docs/1.8.2/gettingstartedpython.html Wed May 31 15:48:43 2017 @@ -0,0 +1,423 @@ + + + + + + + +Apache Avro™ 1.8.2 + Getting Started (Python) + + + + + + + + + +
+ +
+Apache > Avro > Avro +
+ +
+ + + + + + + + + + + + +
+
+
+
+ +
+ + +
+ +
+ +   +
+ + + + + +
+ +

Apache Avro™ 1.8.2 + Getting Started (Python)

+ + +

+ This is a short guide for getting started with Apache Avro™ using + Python. This guide only covers using Avro for data serialization; see + Patrick Hunt's Avro + RPC Quick Start for a good introduction to using Avro for RPC. +

+ + + +

Download

+
+

+ Avro implementations for C, C++, C#, Java, PHP, Python, and Ruby can be + downloaded from the Apache Avro™ + Releases page. This guide uses Avro 1.8.2 +, the latest + version at the time of writing. Download and unzip + avro-1.8.2 +.tar.gz, and install via python + setup.py (this will probably require root privileges). Ensure + that you can import avro from a Python prompt. +

+
+$ tar xvf avro-1.8.2
+.tar.gz
+$ cd avro-1.8.2
+
+$ sudo python setup.py install
+$ python
+>>> import avro # should not raise ImportError
+      
+

+ Alternatively, you may build the Avro Python library from source. From + your the root Avro directory, run the commands +

+
+$ cd lang/py/
+$ ant
+$ sudo python setup.py install
+$ python
+>>> import avro # should not raise ImportError
+      
+
+ + + +

Defining a schema

+
+

+ Avro schemas are defined using JSON. Schemas are composed of primitive types + (null, boolean, int, + long, float, double, + bytes, and string) and complex types (record, + enum, array, map, + union, and fixed). You can learn more about + Avro schemas and types from the specification, but for now let's start + with a simple schema example, user.avsc: +

+
+{"namespace": "example.avro",
+ "type": "record",
+ "name": "User",
+ "fields": [
+     {"name": "name", "type": "string"},
+     {"name": "favorite_number",  "type": ["int", "null"]},
+     {"name": "favorite_color", "type": ["string", "null"]}
+ ]
+}
+      
+

+ This schema defines a record representing a hypothetical user. (Note + that a schema file can only contain a single schema definition.) At + minimum, a record definition must include its type ("type": + "record"), a name ("name": "User"), and fields, in + this case name, favorite_number, and + favorite_color. We also define a namespace + ("namespace": "example.avro"), which together with the name + attribute defines the "full name" of the schema + (example.avro.User in this case). + +

+

+ Fields are defined via an array of objects, each of which defines a name + and type (other attributes are optional, see the record specification for more + details). The type attribute of a field is another schema object, which + can be either a primitive or complex type. For example, the + name field of our User schema is the primitive type + string, whereas the favorite_number and + favorite_color fields are both unions, + represented by JSON arrays. unions are a complex type that + can be any of the types listed in the array; e.g., + favorite_number can either be an int or + null, essentially making it an optional field. +

+
+ + + +

Serializing and deserializing without code generation

+
+

+ Data in Avro is always stored with its corresponding schema, meaning we + can always read a serialized item, regardless of whether we know the + schema ahead of time. This allows us to perform serialization and + deserialization without code generation. Note that the Avro Python + library does not support code generation. +

+

+ Try running the following code snippet, which serializes two users to a + data file on disk, and then reads back and deserializes the data file: +

+
+import avro.schema
+from avro.datafile import DataFileReader, DataFileWriter
+from avro.io import DatumReader, DatumWriter
+
+schema = avro.schema.parse(open("user.avsc", "rb").read())
+
+writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), schema)
+writer.append({"name": "Alyssa", "favorite_number": 256})
+writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
+writer.close()
+
+reader = DataFileReader(open("users.avro", "rb"), DatumReader())
+for user in reader:
+    print user
+reader.close()
+      
+

This outputs:

+
+{u'favorite_color': None, u'favorite_number': 256, u'name': u'Alyssa'}
+{u'favorite_color': u'red', u'favorite_number': 7, u'name': u'Ben'}
+      
+

+ Do make sure that you open your files in binary mode (i.e. using the modes + wb or rb respectively). Otherwise you might + generate corrupt files due to + + automatic replacement of newline characters with the + platform-specific representations. +

+

+ Let's take a closer look at what's going on here. +

+
+schema = avro.schema.parse(open("user.avsc", "rb").read())
+      
+

+ +avro.schema.parse takes a string containing a JSON schema + definition as input and outputs a avro.schema.Schema object + (specifically a subclass of Schema, in this case + RecordSchema). We're passing in the contents of our + user.avsc schema file here. +

+
+writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), schema)
+      
+

+ We create a DataFileWriter, which we'll use to write + serialized items to a data file on disk. The + DataFileWriter constructor takes three arguments: +

+
    + +
  • The file we'll serialize to
  • + +
  • A DatumWriter, which is responsible for actually + serializing the items to Avro's binary format + (DatumWriters can be used separately from + DataFileWriters, e.g., to perform IPC with Avro + TODO: is this true??).
  • + +
  • The schema we're using. The DataFileWriter needs the + schema both to write the schema to the data file, and to verify that + the items we write are valid items and write the appropriate + fields.
  • + +
+
+writer.append({"name": "Alyssa", "favorite_number": 256})
+writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
+        
+

+ We use DataFileWriter.append to add items to our data + file. Avro records are represented as Python dicts. + Since the field favorite_color has type ["int", + "null"], we are not required to specify this field, as shown in + the first append. Were we to omit the required name + field, an exception would be raised. Any extra entries not + corresponding to a field are present in the dict are + ignored. +

+
+reader = DataFileReader(open("users.avro", "rb"), DatumReader())
+        
+

+ We open the file again, this time for reading back from disk. We use + a DataFileReader and DatumReader analagous + to the DataFileWriter and DatumWriter above. +

+
+for user in reader:
+    print user
+        
+

+ The DataFileReader is an iterator that returns + dicts corresponding to the serialized items. +

+
+ +
+ +
 
+
+ + + Added: avro/site/publish/docs/1.8.2/gettingstartedpython.pdf URL: http://svn.apache.org/viewvc/avro/site/publish/docs/1.8.2/gettingstartedpython.pdf?rev=1797063&view=auto ============================================================================== Binary file - no diff available.