hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "SequenceFile" by Arun C Murthy
Date Fri, 13 Jul 2007 10:50:21 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by Arun C Murthy:
http://wiki.apache.org/lucene-hadoop/SequenceFile

------------------------------------------------------------------------------
  
  == SequenceFile Formats ==
  
- This section describes the format for the latest ''''version 4'''' of !SequenceFiles.
+ This section describes the format for the latest ''''version 6'''' of !SequenceFiles.
  
  Essentially there are 3 different file formats for !SequenceFiles depending on whether ''compression''
and ''block compression'' are active.
  
@@ -26, +26 @@

  However all of the above formats share a common ''header'' (which is used by the !SequenceFile.Reader
to return the appropriate key/value pairs). The next section summarises the header:
  [[Anchor(SeqFileHeader)]]
  ===== SequenceFile Common Header =====
-  * version - A byte array: 3 bytes of magic header ''''SEQ'''', followed by 1 byte of actual
version no. (e.g. SEQ4)
+  * version - A byte array: 3 bytes of magic header ''''SEQ'''', followed by 1 byte of actual
version no. (e.g. SEQ4 or SEQ6)
   * keyClassName - String
   * valueClassName - String
   * compression - A boolean which specifies if ''compression'' is turned on for keys/values
in this file.
   * blockCompression -  A boolean which specifies if ''block compression'' is turned on for
keys/values in this file.
+  * compressor class - The classname of the custom compressor which is used to compress keys/values
in this !SequenceFile (if compression is enabled).
+  * metadata - [http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/io/SequenceFile.Metadata.html
SequenceFile.Metadata] for this file (key/value pairs)
   * sync - A sync marker to denote end of the header.
  
  All strings are serialized using [http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/io/Text.html#writeString(java.io.DataOutput,%20java.lang.String)
Text.writeString] api.
  
  [[BR]]
  [[BR]]
- The formats for Uncompressed/!RecordCompressed Writers are very similar:
+ The formats for Uncompressed and !RecordCompressed Writers are very similar:
- ===== Uncompressed/RecordCompressed Writer Format =====
+ ===== Uncompressed & RecordCompressed Writer Format =====
   * [#SeqFileHeader Header]
   * Record
     * Record length

Mime
View raw message