From commits-return-6580-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Mon Jul 16 06:21:57 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 157A01807A1 for ; Mon, 16 Jul 2018 06:21:55 +0200 (CEST) Received: (qmail 52671 invoked by uid 500); 16 Jul 2018 04:21:55 -0000 Mailing-List: contact commits-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list commits@zookeeper.apache.org Received: (qmail 47978 invoked by uid 99); 16 Jul 2018 04:21:47 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Jul 2018 04:21:47 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id B613FE0D4D; Mon, 16 Jul 2018 04:21:46 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: hanm@apache.org To: commits@zookeeper.apache.org Date: Mon, 16 Jul 2018 04:22:26 -0000 Message-Id: In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [42/45] zookeeper git commit: Update website content for release 3.4.13. http://git-wip-us.apache.org/repos/asf/zookeeper/blob/86349e3b/_released_docs/r3.4.13/api/index.html ---------------------------------------------------------------------- diff --git a/_released_docs/r3.4.13/api/index.html b/_released_docs/r3.4.13/api/index.html new file mode 100644 index 0000000..4e0abeb --- /dev/null +++ b/_released_docs/r3.4.13/api/index.html @@ -0,0 +1,75 @@ + + + + + +ZooKeeper 3.4.13 API + + + + + + + + + +<noscript> +<div>JavaScript is disabled on your browser.</div> +</noscript> +<h2>Frame Alert</h2> +<p>This document is designed to be viewed using the frames feature. If you see this message, you are using a non-frame-capable web client. Link to <a href="overview-summary.html">Non-frame version</a>.</p> + + + http://git-wip-us.apache.org/repos/asf/zookeeper/blob/86349e3b/_released_docs/r3.4.13/api/org/apache/jute/Record.html ---------------------------------------------------------------------- diff --git a/_released_docs/r3.4.13/api/org/apache/jute/Record.html b/_released_docs/r3.4.13/api/org/apache/jute/Record.html new file mode 100644 index 0000000..a0f5db9 --- /dev/null +++ b/_released_docs/r3.4.13/api/org/apache/jute/Record.html @@ -0,0 +1,255 @@ + + + + + +Record (ZooKeeper 3.4.13 API) + + + + + + + + + + + + +
+
org.apache.jute
+

Interface Record

+
+
+
+
    +
  • +
    +
    All Known Implementing Classes:
    +
    ACL, Id, Stat, StatPersisted, StatPersistedV1
    +
    +
    +
    +
    @InterfaceAudience.Public
    +public interface Record
    +
    Interface that is implemented by generated classes.
    +
  • +
+
+
+
    +
  • + +
      +
    • + + +

      Method Summary

      + + + + + + + + + + + + + + +
      All Methods Instance Methods Abstract Methods 
      Modifier and TypeMethod and Description
      voiddeserialize(org.apache.jute.InputArchive archive, + java.lang.String tag) 
      voidserialize(org.apache.jute.OutputArchive archive, + java.lang.String tag) 
      +
    • +
    +
  • +
+
+
+
    +
  • + +
      +
    • + + +

      Method Detail

      + + + +
        +
      • +

        serialize

        +
        void serialize(org.apache.jute.OutputArchive archive,
        +               java.lang.String tag)
        +        throws java.io.IOException
        +
        +
        Throws:
        +
        java.io.IOException
        +
        +
      • +
      + + + +
        +
      • +

        deserialize

        +
        void deserialize(org.apache.jute.InputArchive archive,
        +                 java.lang.String tag)
        +          throws java.io.IOException
        +
        +
        Throws:
        +
        java.io.IOException
        +
        +
      • +
      +
    • +
    +
  • +
+
+
+ + + + + +

Copyright © 2018 The Apache Software Foundation

+ + http://git-wip-us.apache.org/repos/asf/zookeeper/blob/86349e3b/_released_docs/r3.4.13/api/org/apache/jute/class-use/Record.html ---------------------------------------------------------------------- diff --git a/_released_docs/r3.4.13/api/org/apache/jute/class-use/Record.html b/_released_docs/r3.4.13/api/org/apache/jute/class-use/Record.html new file mode 100644 index 0000000..af62495 --- /dev/null +++ b/_released_docs/r3.4.13/api/org/apache/jute/class-use/Record.html @@ -0,0 +1,181 @@ + + + + + +Uses of Interface org.apache.jute.Record (ZooKeeper 3.4.13 API) + + + + + + + + + + + +
+

Uses of Interface
org.apache.jute.Record

+
+
+ +
+ + + + +

Copyright © 2018 The Apache Software Foundation

+ + http://git-wip-us.apache.org/repos/asf/zookeeper/blob/86349e3b/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-frame.html ---------------------------------------------------------------------- diff --git a/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-frame.html b/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-frame.html new file mode 100644 index 0000000..06ac06e --- /dev/null +++ b/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-frame.html @@ -0,0 +1,14 @@ + + + + + +org.apache.jute.compiler.generated (ZooKeeper 3.4.13 API) + + + + + +

org.apache.jute.compiler.generated

+ + http://git-wip-us.apache.org/repos/asf/zookeeper/blob/86349e3b/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-summary.html ---------------------------------------------------------------------- diff --git a/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-summary.html b/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-summary.html new file mode 100644 index 0000000..2cbca2d --- /dev/null +++ b/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-summary.html @@ -0,0 +1,137 @@ + + + + + +org.apache.jute.compiler.generated (ZooKeeper 3.4.13 API) + + + + + + + + + + + +
+

Package org.apache.jute.compiler.generated

+
+
This package contains code generated by JavaCC from the + Hadoop record syntax file rcc.jj.
+
+

See: Description

+
+
+ + +

Package org.apache.jute.compiler.generated Description

+
This package contains code generated by JavaCC from the + Hadoop record syntax file rcc.jj. For details about the + record file syntax please @see org.apache.hadoop.record.
+
+ + + + +

Copyright © 2018 The Apache Software Foundation

+ + http://git-wip-us.apache.org/repos/asf/zookeeper/blob/86349e3b/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-tree.html ---------------------------------------------------------------------- diff --git a/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-tree.html b/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-tree.html new file mode 100644 index 0000000..f158215 --- /dev/null +++ b/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-tree.html @@ -0,0 +1,128 @@ + + + + + +org.apache.jute.compiler.generated Class Hierarchy (ZooKeeper 3.4.13 API) + + + + + + + + + + + +
+

Hierarchy For Package org.apache.jute.compiler.generated

+Package Hierarchies: + +
+ + + + +

Copyright © 2018 The Apache Software Foundation

+ + http://git-wip-us.apache.org/repos/asf/zookeeper/blob/86349e3b/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-use.html ---------------------------------------------------------------------- diff --git a/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-use.html b/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-use.html new file mode 100644 index 0000000..f26e41a --- /dev/null +++ b/_released_docs/r3.4.13/api/org/apache/jute/compiler/generated/package-use.html @@ -0,0 +1,125 @@ + + + + + +Uses of Package org.apache.jute.compiler.generated (ZooKeeper 3.4.13 API) + + + + + + + + + + + +
+

Uses of Package
org.apache.jute.compiler.generated

+
+
No usage of org.apache.jute.compiler.generated
+ + + + +

Copyright © 2018 The Apache Software Foundation

+ + http://git-wip-us.apache.org/repos/asf/zookeeper/blob/86349e3b/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-frame.html ---------------------------------------------------------------------- diff --git a/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-frame.html b/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-frame.html new file mode 100644 index 0000000..0388e04 --- /dev/null +++ b/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-frame.html @@ -0,0 +1,14 @@ + + + + + +org.apache.jute.compiler (ZooKeeper 3.4.13 API) + + + + + +

org.apache.jute.compiler

+ + http://git-wip-us.apache.org/repos/asf/zookeeper/blob/86349e3b/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-summary.html ---------------------------------------------------------------------- diff --git a/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-summary.html b/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-summary.html new file mode 100644 index 0000000..57c9758 --- /dev/null +++ b/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-summary.html @@ -0,0 +1,139 @@ + + + + + +org.apache.jute.compiler (ZooKeeper 3.4.13 API) + + + + + + + + + + + +
+

Package org.apache.jute.compiler

+
+
This package contains classes needed for code generation + from the hadoop record compiler.
+
+

See: Description

+
+
+ + +

Package org.apache.jute.compiler Description

+
This package contains classes needed for code generation + from the hadoop record compiler. CppGenerator and JavaGenerator + are the main entry points from the parser. There are classes + corrsponding to every primitive type and compound type + included in Hadoop record I/O syntax.
+
+ + + + +

Copyright © 2018 The Apache Software Foundation

+ + http://git-wip-us.apache.org/repos/asf/zookeeper/blob/86349e3b/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-tree.html ---------------------------------------------------------------------- diff --git a/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-tree.html b/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-tree.html new file mode 100644 index 0000000..df659c0 --- /dev/null +++ b/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-tree.html @@ -0,0 +1,128 @@ + + + + + +org.apache.jute.compiler Class Hierarchy (ZooKeeper 3.4.13 API) + + + + + + + + + + + +
+

Hierarchy For Package org.apache.jute.compiler

+Package Hierarchies: + +
+ + + + +

Copyright © 2018 The Apache Software Foundation

+ + http://git-wip-us.apache.org/repos/asf/zookeeper/blob/86349e3b/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-use.html ---------------------------------------------------------------------- diff --git a/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-use.html b/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-use.html new file mode 100644 index 0000000..2393913 --- /dev/null +++ b/_released_docs/r3.4.13/api/org/apache/jute/compiler/package-use.html @@ -0,0 +1,125 @@ + + + + + +Uses of Package org.apache.jute.compiler (ZooKeeper 3.4.13 API) + + + + + + + + + + + +
+

Uses of Package
org.apache.jute.compiler

+
+
No usage of org.apache.jute.compiler
+ + + + +

Copyright © 2018 The Apache Software Foundation

+ + http://git-wip-us.apache.org/repos/asf/zookeeper/blob/86349e3b/_released_docs/r3.4.13/api/org/apache/jute/package-frame.html ---------------------------------------------------------------------- diff --git a/_released_docs/r3.4.13/api/org/apache/jute/package-frame.html b/_released_docs/r3.4.13/api/org/apache/jute/package-frame.html new file mode 100644 index 0000000..2396aaa --- /dev/null +++ b/_released_docs/r3.4.13/api/org/apache/jute/package-frame.html @@ -0,0 +1,20 @@ + + + + + +org.apache.jute (ZooKeeper 3.4.13 API) + + + + + +

org.apache.jute

+
+

Interfaces

+ +
+ + http://git-wip-us.apache.org/repos/asf/zookeeper/blob/86349e3b/_released_docs/r3.4.13/api/org/apache/jute/package-summary.html ---------------------------------------------------------------------- diff --git a/_released_docs/r3.4.13/api/org/apache/jute/package-summary.html b/_released_docs/r3.4.13/api/org/apache/jute/package-summary.html new file mode 100644 index 0000000..4698e58 --- /dev/null +++ b/_released_docs/r3.4.13/api/org/apache/jute/package-summary.html @@ -0,0 +1,930 @@ + + + + + +org.apache.jute (ZooKeeper 3.4.13 API) + + + + + + + + + + + +
+

Package org.apache.jute

+
+
Hadoop record I/O contains classes and a record description language + translator for simplifying serialization and deserialization of records in a + language-neutral manner.
+
+

See: Description

+
+
+
    +
  • + + + + + + + + + + + + +
    Interface Summary 
    InterfaceDescription
    Record +
    Interface that is implemented by generated classes.
    +
    +
  • +
+ + + +

Package org.apache.jute Description

+
Hadoop record I/O contains classes and a record description language + translator for simplifying serialization and deserialization of records in a + language-neutral manner. + +

Introduction

+ + Software systems of any significant complexity require mechanisms for data +interchange with the outside world. These interchanges typically involve the +marshaling and unmarshaling of logical units of data to and from data streams +(files, network connections, memory buffers etc.). Applications usually have +some code for serializing and deserializing the data types that they manipulate +embedded in them. The work of serialization has several features that make +automatic code generation for it worthwhile. Given a particular output encoding +(binary, XML, etc.), serialization of primitive types and simple compositions +of primitives (structs, vectors etc.) is a very mechanical task. Manually +written serialization code can be susceptible to bugs especially when records +have a large number of fields or a record definition changes between software +versions. Lastly, it can be very useful for applications written in different +programming languages to be able to share and interchange data. This can be +made a lot easier by describing the data records manipulated by these +applications in a language agnostic manner and using the descriptions to derive +implementations of serialization in multiple target languages. + +This document describes Hadoop Record I/O, a mechanism that is aimed +at +
    +
  • enabling the specification of simple serializable data types (records) +
  • enabling the generation of code in multiple target languages for +marshaling and unmarshaling such types +
  • providing target language specific support that will enable application +programmers to incorporate generated code into their applications +
+ +The goals of Hadoop Record I/O are similar to those of mechanisms such as XDR, +ASN.1, PADS and ICE. While these systems all include a DDL that enables +the specification of most record types, they differ widely in what else they +focus on. The focus in Hadoop Record I/O is on data marshaling and +multi-lingual support. We take a translator-based approach to serialization. +Hadoop users have to describe their data in a simple data description +language. The Hadoop DDL translator rcc generates code that users +can invoke in order to read/write their data from/to simple stream +abstractions. Next we list explicitly some of the goals and non-goals of +Hadoop Record I/O. + + +

Goals

+ +
    +
  • Support for commonly used primitive types. Hadoop should include as +primitives commonly used builtin types from programming languages we intend to +support. + +
  • Support for common data compositions (including recursive compositions). +Hadoop should support widely used composite types such as structs and +vectors. + +
  • Code generation in multiple target languages. Hadoop should be capable of +generating serialization code in multiple target languages and should be +easily extensible to new target languages. The initial target languages are +C++ and Java. + +
  • Support for generated target languages. Hadooop should include support +in the form of headers, libraries, packages for supported target languages +that enable easy inclusion and use of generated code in applications. + +
  • Support for multiple output encodings. Candidates include +packed binary, comma-separated text, XML etc. + +
  • Support for specifying record types in a backwards/forwards compatible +manner. This will probably be in the form of support for optional fields in +records. This version of the document does not include a description of the +planned mechanism, we intend to include it in the next iteration. + +
+ +

Non-Goals

+ +
    +
  • Serializing existing arbitrary C++ classes. +
  • Serializing complex data structures such as trees, linked lists etc. +
  • Built-in indexing schemes, compression, or check-sums. +
  • Dynamic construction of objects from an XML schema. +
+ +The remainder of this document describes the features of Hadoop record I/O +in more detail. Section 2 describes the data types supported by the system. +Section 3 lays out the DDL syntax with some examples of simple records. +Section 4 describes the process of code generation with rcc. Section 5 +describes target language mappings and support for Hadoop types. We include a +fairly complete description of C++ mappings with intent to include Java and +others in upcoming iterations of this document. The last section talks about +supported output encodings. + + +

Data Types and Streams

+ +This section describes the primitive and composite types supported by Hadoop. +We aim to support a set of types that can be used to simply and efficiently +express a wide range of record types in different programming languages. + +

Primitive Types

+ +For the most part, the primitive types of Hadoop map directly to primitive +types in high level programming languages. Special cases are the +ustring (a Unicode string) and buffer types, which we believe +find wide use and which are usually implemented in library code and not +available as language built-ins. Hadoop also supplies these via library code +when a target language built-in is not present and there is no widely +adopted "standard" implementation. The complete list of primitive types is: + +
    +
  • byte: An 8-bit unsigned integer. +
  • boolean: A boolean value. +
  • int: A 32-bit signed integer. +
  • long: A 64-bit signed integer. +
  • float: A single precision floating point number as described by + IEEE-754. +
  • double: A double precision floating point number as described by + IEEE-754. +
  • ustring: A string consisting of Unicode characters. +
  • buffer: An arbitrary sequence of bytes. +
+ + +

Composite Types

+Hadoop supports a small set of composite types that enable the description +of simple aggregate types and containers. A composite type is serialized +by sequentially serializing it constituent elements. The supported +composite types are: + +
    + +
  • record: An aggregate type like a C-struct. This is a list of +typed fields that are together considered a single unit of data. A record +is serialized by sequentially serializing its constituent fields. In addition +to serialization a record has comparison operations (equality and less-than) +implemented for it, these are defined as memberwise comparisons. + +
  • vector: A sequence of entries of the same data type, primitive +or composite. + +
  • map: An associative container mapping instances of a key type to +instances of a value type. The key and value types may themselves be primitive +or composite types. + +
+ +

Streams

+ +Hadoop generates code for serializing and deserializing record types to +abstract streams. For each target language Hadoop defines very simple input +and output stream interfaces. Application writers can usually develop +concrete implementations of these by putting a one method wrapper around +an existing stream implementation. + + +

DDL Syntax and Examples

+ +We now describe the syntax of the Hadoop data description language. This is +followed by a few examples of DDL usage. + +

Hadoop DDL Syntax

+ +

+recfile = *include module *record
+include = "include" path
+path = (relative-path / absolute-path)
+module = "module" module-name
+module-name = name *("." name)
+record := "class" name "{" 1*(field) "}"
+field := type name ";"
+name :=  ALPHA (ALPHA / DIGIT / "_" )*
+type := (ptype / ctype)
+ptype := ("byte" / "boolean" / "int" |
+          "long" / "float" / "double"
+          "ustring" / "buffer")
+ctype := (("vector" "<" type ">") /
+          ("map" "<" type "," type ">" ) ) / name)
+
+ +A DDL file describes one or more record types. It begins with zero or +more include declarations, a single mandatory module declaration +followed by zero or more class declarations. The semantics of each of +these declarations are described below: + +
    + +
  • include: An include declaration specifies a DDL file to be +referenced when generating code for types in the current DDL file. Record types +in the current compilation unit may refer to types in all included files. +File inclusion is recursive. An include does not trigger code +generation for the referenced file. + +
  • module: Every Hadoop DDL file must have a single module +declaration that follows the list of includes and precedes all record +declarations. A module declaration identifies a scope within which +the names of all types in the current file are visible. Module names are +mapped to C++ namespaces, Java packages etc. in generated code. + +
  • class: Records types are specified through class +declarations. A class declaration is like a Java class declaration. +It specifies a named record type and a list of fields that constitute records +of the type. Usage is illustrated in the following examples. + +
+ +

Examples

+ +
    +
  • A simple DDL file links.jr with just one record declaration. +
    
    +module links {
    +    class Link {
    +        ustring URL;
    +        boolean isRelative;
    +        ustring anchorText;
    +    };
    +}
    +
    + +
  • A DDL file outlinks.jr which includes another +
    
    +include "links.jr"
    +
    +module outlinks {
    +    class OutLinks {
    +        ustring baseURL;
    +        vector outLinks;
    +    };
    +}
    +
    +
+ +

Code Generation

+ +The Hadoop translator is written in Java. Invocation is done by executing a +wrapper shell script named named rcc. It takes a list of +record description files as a mandatory argument and an +optional language argument (the default is Java) --language or +-l. Thus a typical invocation would look like: +

+$ rcc -l C++  ...
+
+ + +

Target Language Mappings and Support

+ +For all target languages, the unit of code generation is a record type. +For each record type, Hadoop generates code for serialization and +deserialization, record comparison and access to record members. + +

C++

+ +Support for including Hadoop generated C++ code in applications comes in the +form of a header file recordio.hh which needs to be included in source +that uses Hadoop types and a library librecordio.a which applications need +to be linked with. The header declares the Hadoop C++ namespace which defines +appropriate types for the various primitives, the basic interfaces for +records and streams and enumerates the supported serialization encodings. +Declarations of these interfaces and a description of their semantics follow: + +

+namespace hadoop {
+
+  enum RecFormat { kBinary, kXML, kCSV };
+
+  class InStream {
+  public:
+    virtual ssize_t read(void *buf, size_t n) = 0;
+  };
+
+  class OutStream {
+  public:
+    virtual ssize_t write(const void *buf, size_t n) = 0;
+  };
+
+  class IOError : public runtime_error {
+  public:
+    explicit IOError(const std::string& msg);
+  };
+
+  class IArchive;
+  class OArchive;
+
+  class RecordReader {
+  public:
+    RecordReader(InStream& in, RecFormat fmt);
+    virtual ~RecordReader(void);
+
+    virtual void read(Record& rec);
+  };
+
+  class RecordWriter {
+  public:
+    RecordWriter(OutStream& out, RecFormat fmt);
+    virtual ~RecordWriter(void);
+
+    virtual void write(Record& rec);
+  };
+
+
+  class Record {
+  public:
+    virtual std::string type(void) const = 0;
+    virtual std::string signature(void) const = 0;
+  protected:
+    virtual bool validate(void) const = 0;
+
+    virtual void
+    serialize(OArchive& oa, const std::string& tag) const = 0;
+
+    virtual void
+    deserialize(IArchive& ia, const std::string& tag) = 0;
+  };
+}
+
+ +
    + +
  • RecFormat: An enumeration of the serialization encodings supported +by this implementation of Hadoop. + +
  • InStream: A simple abstraction for an input stream. This has a +single public read method that reads n bytes from the stream into +the buffer buf. Has the same semantics as a blocking read system +call. Returns the number of bytes read or -1 if an error occurs. + +
  • OutStream: A simple abstraction for an output stream. This has a +single write method that writes n bytes to the stream from the +buffer buf. Has the same semantics as a blocking write system +call. Returns the number of bytes written or -1 if an error occurs. + +
  • RecordReader: A RecordReader reads records one at a time from +an underlying stream in a specified record format. The reader is instantiated +with a stream and a serialization format. It has a read method that +takes an instance of a record and deserializes the record from the stream. + +
  • RecordWriter: A RecordWriter writes records one at a +time to an underlying stream in a specified record format. The writer is +instantiated with a stream and a serialization format. It has a +write method that takes an instance of a record and serializes the +record to the stream. + +
  • Record: The base class for all generated record types. This has two +public methods type and signature that return the typename and the +type signature of the record. + +
+ +Two files are generated for each record file (note: not for each record). If a +record file is named "name.jr", the generated files are +"name.jr.cc" and "name.jr.hh" containing serialization +implementations and record type declarations respectively. + +For each record in the DDL file, the generated header file will contain a +class definition corresponding to the record type, method definitions for the +generated type will be present in the '.cc' file. The generated class will +inherit from the abstract class hadoop::Record. The DDL files +module declaration determines the namespace the record belongs to. +Each '.' delimited token in the module declaration results in the +creation of a namespace. For instance, the declaration module docs.links +results in the creation of a docs namespace and a nested +docs::links namespace. In the preceding examples, the Link class +is placed in the links namespace. The header file corresponding to +the links.jr file will contain: + +

+namespace links {
+  class Link : public hadoop::Record {
+    // ....
+  };
+};
+
+ +Each field within the record will cause the generation of a private member +declaration of the appropriate type in the class declaration, and one or more +acccessor methods. The generated class will implement the serialize and +deserialize methods defined in hadoop::Record+. It will also +implement the inspection methods type and signature from +hadoop::Record. A default constructor and virtual destructor will also +be generated. Serialization code will read/write records into streams that +implement the hadoop::InStream and the hadoop::OutStream interfaces. + +For each member of a record an accessor method is generated that returns +either the member or a reference to the member. For members that are returned +by value, a setter method is also generated. This is true for primitive +data members of the types byte, int, long, boolean, float and +double. For example, for a int field called MyField the folowing +code is generated. + +

+...
+private:
+  int32_t mMyField;
+  ...
+public:
+  int32_t getMyField(void) const {
+    return mMyField;
+  };
+
+  void setMyField(int32_t m) {
+    mMyField = m;
+  };
+  ...
+
+ +For a ustring or buffer or composite field. The generated code +only contains accessors that return a reference to the field. A const +and a non-const accessor are generated. For example: + +

+...
+private:
+  std::string mMyBuf;
+  ...
+public:
+
+  std::string& getMyBuf() {
+    return mMyBuf;
+  };
+
+  const std::string& getMyBuf() const {
+    return mMyBuf;
+  };
+  ...
+
+ +

Examples

+ +Suppose the inclrec.jr file contains: +

+module inclrec {
+    class RI {
+        int      I32;
+        double   D;
+        ustring  S;
+    };
+}
+
+ +and the testrec.jr file contains: + +

+include "inclrec.jr"
+module testrec {
+    class R {
+        vector VF;
+        RI            Rec;
+        buffer        Buf;
+    };
+}
+
+ +Then the invocation of rcc such as: +

+$ rcc -l c++ inclrec.jr testrec.jr
+
+will result in generation of four files: +inclrec.jr.{cc,hh} and testrec.jr.{cc,hh}. + +The inclrec.jr.hh will contain: + +

+#ifndef _INCLREC_JR_HH_
+#define _INCLREC_JR_HH_
+
+#include "recordio.hh"
+
+namespace inclrec {
+  
+  class RI : public hadoop::Record {
+
+  private:
+
+    int32_t      mI32;
+    double       mD;
+    std::string  mS;
+
+  public:
+
+    RI(void);
+    virtual ~RI(void);
+
+    virtual bool operator==(const RI& peer) const;
+    virtual bool operator<(const RI& peer) const;
+
+    virtual int32_t getI32(void) const { return mI32; }
+    virtual void setI32(int32_t v) { mI32 = v; }
+
+    virtual double getD(void) const { return mD; }
+    virtual void setD(double v) { mD = v; }
+
+    virtual std::string& getS(void) const { return mS; }
+    virtual const std::string& getS(void) const { return mS; }
+
+    virtual std::string type(void) const;
+    virtual std::string signature(void) const;
+
+  protected:
+
+    virtual void serialize(hadoop::OArchive& a) const;
+    virtual void deserialize(hadoop::IArchive& a);
+
+    virtual bool validate(void);
+  };
+} // end namespace inclrec
+
+#endif /* _INCLREC_JR_HH_ */
+
+
+ +The testrec.jr.hh file will contain: + + +

+
+#ifndef _TESTREC_JR_HH_
+#define _TESTREC_JR_HH_
+
+#include "inclrec.jr.hh"
+
+namespace testrec {
+  class R : public hadoop::Record {
+
+  private:
+
+    std::vector mVF;
+    inclrec::RI        mRec;
+    std::string        mBuf;
+
+  public:
+
+    R(void);
+    virtual ~R(void);
+
+    virtual bool operator==(const R& peer) const;
+    virtual bool operator<(const R& peer) const;
+
+    virtual std::vector& getVF(void) const;
+    virtual const std::vector& getVF(void) const;
+
+    virtual std::string& getBuf(void) const ;
+    virtual const std::string& getBuf(void) const;
+
+    virtual inclrec::RI& getRec(void) const;
+    virtual const inclrec::RI& getRec(void) const;
+    
+    virtual bool serialize(hadoop::OutArchive& a) const;
+    virtual bool deserialize(hadoop::InArchive& a);
+    
+    virtual std::string type(void) const;
+    virtual std::string signature(void) const;
+  };
+}; // end namespace testrec
+#endif /* _TESTREC_JR_HH_ */
+
+
+ +

Java

+ +Code generation for Java is similar to that for C++. A Java class is generated +for each record type with private members corresponding to the fields. Getters +and setters for fields are also generated. Some differences arise in the +way comparison is expressed and in the mapping of modules to packages and +classes to files. For equality testing, an equals method is generated +for each record type. As per Java requirements a hashCode method is also +generated. For comparison a compareTo method is generated for each +record type. This has the semantics as defined by the Java Comparable +interface, that is, the method returns a negative integer, zero, or a positive +integer as the invoked object is less than, equal to, or greater than the +comparison parameter. + +A .java file is generated per record type as opposed to per DDL +file as in C++. The module declaration translates to a Java +package declaration. The module name maps to an identical Java package +name. In addition to this mapping, the DDL compiler creates the appropriate +directory hierarchy for the package and places the generated .java +files in the correct directories. + +

Mapping Summary

+ +

+DDL Type        C++ Type            Java Type 
+
+boolean         bool                boolean
+byte            int8_t              byte
+int             int32_t             int
+long            int64_t             long
+float           float               float
+double          double              double
+ustring         std::string         Text
+buffer          std::string         java.io.ByteArrayOutputStream
+class type      class type          class type
+vector    std::vector   java.util.ArrayList
+map  std::map java.util.TreeMap
+
+ +

Data encodings

+ +This section describes the format of the data encodings supported by Hadoop. +Currently, three data encodings are supported, namely binary, CSV and XML. + +

Binary Serialization Format

+ +The binary data encoding format is fairly dense. Serialization of composite +types is simply defined as a concatenation of serializations of the constituent +elements (lengths are included in vectors and maps). + +Composite types are serialized as follows: +
    +
  • class: Sequence of serialized members. +
  • vector: The number of elements serialized as an int. Followed by a +sequence of serialized elements. +
  • map: The number of key value pairs serialized as an int. Followed +by a sequence of serialized (key,value) pairs. +
+ +Serialization of primitives is more interesting, with a zero compression +optimization for integral types and normalization to UTF-8 for strings. +Primitive types are serialized as follows: + +
    +
  • byte: Represented by 1 byte, as is. +
  • boolean: Represented by 1-byte (0 or 1) +
  • int/long: Integers and longs are serialized zero compressed. +Represented as 1-byte if -120 <= value < 128. Otherwise, serialized as a +sequence of 2-5 bytes for ints, 2-9 bytes for longs. The first byte represents +the number of trailing bytes, N, as the negative number (-120-N). For example, +the number 1024 (0x400) is represented by the byte sequence 'x86 x04 x00'. +This doesn't help much for 4-byte integers but does a reasonably good job with +longs without bit twiddling. +
  • float/double: Serialized in IEEE 754 single and double precision +format in network byte order. This is the format used by Java. +
  • ustring: Serialized as 4-byte zero compressed length followed by +data encoded as UTF-8. Strings are normalized to UTF-8 regardless of native +language representation. +
  • buffer: Serialized as a 4-byte zero compressed length followed by the +raw bytes in the buffer. +
+ + +

CSV Serialization Format

+ +The CSV serialization format has a lot more structure than the "standard" +Excel CSV format, but we believe the additional structure is useful because + +
    +
  • it makes parsing a lot easier without detracting too much from legibility +
  • the delimiters around composites make it obvious when one is reading a +sequence of Hadoop records +
+ +Serialization formats for the various types are detailed in the grammar that +follows. The notable feature of the formats is the use of delimiters for +indicating the certain field types. + +
    +
  • A string field begins with a single quote ('). +
  • A buffer field begins with a sharp (#). +
  • A class, vector or map begins with 's{', 'v{' or 'm{' respectively and +ends with '}'. +
+ +The CSV format can be described by the following grammar: + +

+record = primitive / struct / vector / map
+primitive = boolean / int / long / float / double / ustring / buffer
+
+boolean = "T" / "F"
+int = ["-"] 1*DIGIT
+long = ";" ["-"] 1*DIGIT
+float = ["-"] 1*DIGIT "." 1*DIGIT ["E" / "e" ["-"] 1*DIGIT]
+double = ";" ["-"] 1*DIGIT "." 1*DIGIT ["E" / "e" ["-"] 1*DIGIT]
+
+ustring = "'" *(UTF8 char except NULL, LF, % and , / "%00" / "%0a" / "%25" / "%2c" )
+
+buffer = "#" *(BYTE except NULL, LF, % and , / "%00" / "%0a" / "%25" / "%2c" )
+
+struct = "s{" record *("," record) "}"
+vector = "v{" [record *("," record)] "}"
+map = "m{" [*(record "," record)] "}"
+
+ +

XML Serialization Format

+ +The XML serialization format is the same used by Apache XML-RPC +(http://ws.apache.org/xmlrpc/types.html). This is an extension of the original +XML-RPC format and adds some additional data types. All record I/O types are +not directly expressible in this format, and access to a DDL is required in +order to convert these to valid types. All types primitive or composite are +represented by <value> elements. The particular XML-RPC type is +indicated by a nested element in the <value> element. The encoding for +records is always UTF-8. Primitive types are serialized as follows: + +
    +
  • byte: XML tag <ex:i1>. Values: 1-byte unsigned +integers represented in US-ASCII +
  • boolean: XML tag <boolean>. Values: "0" or "1" +
  • int: XML tags <i4> or <int>. Values: 4-byte +signed integers represented in US-ASCII. +
  • long: XML tag <ex:i8>. Values: 8-byte signed integers +represented in US-ASCII. +
  • float: XML tag <ex:float>. Values: Single precision +floating point numbers represented in US-ASCII. +
  • double: XML tag <double>. Values: Double precision +floating point numbers represented in US-ASCII. +
  • ustring: XML tag <;string>. Values: String values +represented as UTF-8. XML does not permit all Unicode characters in literal +data. In particular, NULLs and control chars are not allowed. Additionally, +XML processors are required to replace carriage returns with line feeds and to +replace CRLF sequences with line feeds. Programming languages that we work +with do not impose these restrictions on string types. To work around these +restrictions, disallowed characters and CRs are percent escaped in strings. +The '%' character is also percent escaped. +
  • buffer: XML tag <string&>. Values: Arbitrary binary +data. Represented as hexBinary, each byte is replaced by its 2-byte +hexadecimal representation. +
+ +Composite types are serialized as follows: + +
    +
  • class: XML tag <struct>. A struct is a sequence of +<member> elements. Each <member> element has a <name> +element and a <value> element. The <name> is a string that must +match /[a-zA-Z][a-zA-Z0-9_]*/. The value of the member is represented +by a <value> element. + +
  • vector: XML tag <array<. An <array> contains a +single <data> element. The <data> element is a sequence of +<value> elements each of which represents an element of the vector. + +
  • map: XML tag <array>. Same as vector. + +
+ +For example: + +

+class {
+  int           MY_INT;            // value 5
+  vector MY_VEC;            // values 0.1, -0.89, 2.45e4
+  buffer        MY_BUF;            // value '\00\n\tabc%'
+}
+
+ +is serialized as + +

+<value>
+  <struct>
+    <member>
+      <name>MY_INT</name>
+      <value><i4>5</i4></value>
+    </member>
+    <member>
+      <name>MY_VEC</name>
+      <value>
+        <array>
+          <data>
+            <value><ex:float>0.1</ex:float></value>
+            <value><ex:float>-0.89</ex:float></value>
+            <value><ex:float>2.45e4</ex:float></value>
+          </data>
+        </array>
+      </value>
+    </member>
+    <member>
+      <name>MY_BUF</name>
+      <value><string>%00\n\tabc%25</string></value>
+    </member>
+  </struct>
+</value> 
+
+
+ + + + +

Copyright © 2018 The Apache Software Foundation

+ +