hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Avro/Glossary" by DougCutting
Date Wed, 02 Dec 2009 18:05:17 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Avro/Glossary" page has been changed by DougCutting.
The comment on this change is: Provide definitions of API styles..
http://wiki.apache.org/hadoop/Avro/Glossary?action=diff&rev1=8&rev2=9

--------------------------------------------------

  === IO ===
   * '''Encoder'''/'''Decoder''': Avro specifies two different encodings: Binary and JSON.
See http://hadoop.apache.org/avro/docs/current/spec.html#Encodings for the details of these
encodings.
-  * '''block''': Array and Maps are encoded as a series of blocks, with a "count" long at
the beginning of each block (and optionally "size"). Used for reading and writing data structures
that don't fit into memory (maybe; not implemented yet). May also refer to the "blocks" in
a file object container.
+  * '''block''': Array and Maps are encoded as a series of blocks, with a "count" long at
the beginning of each block (and optionally "size"). Used for reading and writing data structures
that don't fit into memory. May also refer to the "blocks" in a file object container.
   * '''!DatumReader'''/'''!DatumWriter'''
   * '''!DataFileReader'''/'''!DataFileWriter'''
   * '''Projection''': The ability to select a subset of data from an Avro schema by specifying
an "expected" schema with the objects you'd like to read. Can possibly avoid the overhead
of deserialization of all columns when you only want a few.
@@ -13, +13 @@

   * '''server'''
   * '''transceiver'''
  
+ === API Styles ===
+ Different API styles are possible with Avro, and correspond to different in-memory representations
for Avro data.
- === Other ===
-  * '''specific''': take advantage of language-specific features when implementing a schema
(e.g. code generation of Java classes in the Java implementation).
-  * '''generic'''
-  * '''reflect'''
  
- Most Avro terms of art are defined in the [[http://hadoop.apache.org/avro/docs/current/spec.html|specification]].
+  * '''generic'''  All avro records are represented by a generic attribute/value data structure.
 This style is most useful for systems which dynamically process datasets based on user-provided
scripts.  For example, a program may be passed a data file whose schema has not been previously
seen by the program and told to sort it by the field named "city".
+  * '''specific''': Each Avro record corresponds to a different kind of object in the programming
language.  For example, in Java, C and C++, a specific API would generate a distinct class
or struct definition for each record definition.  This style is used for programs written
to process a specific schema.  RPC systems typically use this.
+  * '''reflect''' Avro schemas are generated via reflection to correspond to existing programming
language datastructures.  This may be useful when converting an existing codebase to use Avro
with minimal modifications.
  
+ Many Avro terms of art are defined in the [[http://hadoop.apache.org/avro/docs/current/spec.html|specification]].
+ 

Mime
View raw message