<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>core-commits@hadoop.apache.org Archives</title>
<link rel="self" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/?format=atom"/>
<link href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/"/>
<id>http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/</id>
<updated>2009-06-30T11:27:20Z</updated>
<entry>
<title>[Hadoop Wiki] Update of &quot;Hive/Presentations&quot; by NamitJain</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090627015008.12114.89860@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090627015008-12114-89860@eos-apache-org%3e</id>
<updated>2009-06-27T01:50:08Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by NamitJain:
http://wiki.apache.org/hadoop/Hive/Presentations

------------------------------------------------------------------------------
   * [http://www.slideshare.net/jsensarma/hadoop-hive-talk-at-iitdelhi Large Scale Data Processing
using commodity SW/HW], IIT-Delhi CS Dept., (Joydeep Sen Sarma, Facebook)
   * [http://www.slideshare.net/prasadc/hive-percona-2009 Data Warehousing &amp; Analytics
on Hadoop], Percon Conference, Santa Clara, CA, USA (Ashish Thusoo, Prasad Chakka, Facebook)
   * [http://www.slideshare.net/namit_jain/hadoop-summit-2009-hive Hive: Hadoop Summit 2009],
Santa Clara, CA, USA (Namit Jain, Zheng Shao, Facebook)
+  * [http://www.slideshare.net/namit_jain/hive-demo-paper-at-vldb-2009 Hive: VLDB 2009],
Lyon, France (Facebook)
  


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;Hive/Presentations&quot; by NamitJain</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090627014803.11794.51881@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090627014803-11794-51881@eos-apache-org%3e</id>
<updated>2009-06-27T01:48:03Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by NamitJain:
http://wiki.apache.org/hadoop/Hive/Presentations

------------------------------------------------------------------------------
   * [http://www.slideshare.net/jhammerb/20081030linkedin An Introduction to Hive], Jeff Hammerbacher,
Facebook
   * [http://www.slideshare.net/jsensarma/hadoop-hive-talk-at-iitdelhi Large Scale Data Processing
using commodity SW/HW], IIT-Delhi CS Dept., (Joydeep Sen Sarma, Facebook)
   * [http://www.slideshare.net/prasadc/hive-percona-2009 Data Warehousing &amp; Analytics
on Hadoop], Percon Conference, Santa Clara, CA, USA (Ashish Thusoo, Prasad Chakka, Facebook)
-  * [http://www.slideshare.net/namit_jain/hadoop-summit-2009-hive], Hadoop Summit 2009, Santa
Clara, CA, USA (Namit Jain, Zheng Shao, Facebook)
+  * [http://www.slideshare.net/namit_jain/hadoop-summit-2009-hive Hive: Hadoop Summit 2009],
Santa Clara, CA, USA (Namit Jain, Zheng Shao, Facebook)
  


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;Hive/Presentations&quot; by NamitJain</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090627014615.11501.73814@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090627014615-11501-73814@eos-apache-org%3e</id>
<updated>2009-06-27T01:46:15Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by NamitJain:
http://wiki.apache.org/hadoop/Hive/Presentations

------------------------------------------------------------------------------
   * [http://www.slideshare.net/jhammerb/20081030linkedin An Introduction to Hive], Jeff Hammerbacher,
Facebook
   * [http://www.slideshare.net/jsensarma/hadoop-hive-talk-at-iitdelhi Large Scale Data Processing
using commodity SW/HW], IIT-Delhi CS Dept., (Joydeep Sen Sarma, Facebook)
   * [http://www.slideshare.net/prasadc/hive-percona-2009 Data Warehousing &amp; Analytics
on Hadoop], Percon Conference, Santa Clara, CA, USA (Ashish Thusoo, Prasad Chakka, Facebook)
+  * [http://www.slideshare.net/namit_jain/hadoop-summit-2009-hive], Hadoop Summit 2009, Santa
Clara, CA, USA (Namit Jain, Zheng Shao, Facebook)
  


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;BayAreaHadoopUserGroup&quot; by ChristopheBisciglia</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626235726.14628.45698@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090626235726-14628-45698@eos-apache-org%3e</id>
<updated>2009-06-26T23:57:26Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by ChristopheBisciglia:
http://wiki.apache.org/hadoop/BayAreaHadoopUserGroup

The comment on the change is:
switching to meetup for better management tools

------------------------------------------------------------------------------
  When:  
-   Every Third Wednesday of the month
+   Every third Wednesday evening of the month
-   6:00-7:30 pm (but people usually hang around till 9pm)
  
  Where:
+   Location varies between north and south bay, see the meetup group below for the upcoming
schedule
-   Yahoo! 
-   700 First Avenue
-   Sunnyvale, California 94089 
- 
-   From First Avenue, turn '''left''' into the parking lot of building by itself at the corner
of Mathilda and First.
- 
- Room:
-   Building E Classroom 10
  
  More Info:
-   There is usually a posting on upcoming.yahoo.com or as well as on the various Hadoop mailing
lists @apache.org 
+   http://www.meetup.com/Bay-Area-Hadoop-User-Group-HUG/
  


</pre>
</div>
</content>
</entry>
<entry>
<title>svn commit: r788900 - /hadoop/common/trunk/CHANGES.txt</title>
<author><name>shv@apache.org</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626225252.6079223888D0@eris.apache.org%3e"/>
<id>urn:uuid:%3c20090626225252-6079223888D0@eris-apache-org%3e</id>
<updated>2009-06-26T22:52:52Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Author: shv
Date: Fri Jun 26 22:52:51 2009
New Revision: 788900

URL: http://svn.apache.org/viewvc?rev=788900&amp;view=rev
Log:
HADOOP-5897. Promote new name-node metrics to  branch 0.20.


Modified:
    hadoop/common/trunk/CHANGES.txt

Modified: hadoop/common/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/CHANGES.txt?rev=788900&amp;r1=788899&amp;r2=788900&amp;view=diff
==============================================================================
--- hadoop/common/trunk/CHANGES.txt (original)
+++ hadoop/common/trunk/CHANGES.txt Fri Jun 26 22:52:51 2009
@@ -152,9 +152,6 @@
     HADOOP-5170. Allows jobs to set max maps/reduces per-node and per-cluster.
     (Matei Zaharia via ddas)
 
-    HADOOP-5897. Add name-node metrics to capture java heap usage.
-    (Suresh Srinivas via shv)
-
     HADOOP-3315. Add a new, binary file foramt, TFile. (Hong Tang via cdouglas)
 
   IMPROVEMENTS
@@ -881,6 +878,9 @@
     HADOOP-4372. Improves the way history filenames are obtained and manipulated.
     (Amar Kamat via ddas)
 
+    HADOOP-5897. Add name-node metrics to capture java heap usage.
+    (Suresh Srinivas via shv)
+
   OPTIMIZATIONS
 
   BUG FIXES




</pre>
</div>
</content>
</entry>
<entry>
<title>svn commit: r788899 - in /hadoop/common/branches/branch-0.20: ./ src/hdfs/org/apache/hadoop/hdfs/server/namenode/ src/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/ src/test/org/apache/hadoop/hdfs/server/namenode/metrics/</title>
<author><name>shv@apache.org</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626225008.F37BB23888D0@eris.apache.org%3e"/>
<id>urn:uuid:%3c20090626225008-F37BB23888D0@eris-apache-org%3e</id>
<updated>2009-06-26T22:50:08Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Author: shv
Date: Fri Jun 26 22:50:08 2009
New Revision: 788899

URL: http://svn.apache.org/viewvc?rev=788899&amp;view=rev
Log:
HADOOP-5897. Merge -r 785024:785025 from trunk to branch 0.20.

Added:
    hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/hdfs/server/namenode/metrics/
    hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
  (with props)
Modified:
    hadoop/common/branches/branch-0.20/   (props changed)
    hadoop/common/branches/branch-0.20/CHANGES.txt
    hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BlocksMap.java
    hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/CorruptReplicasMap.java
    hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
    hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/FSNamesystemMetrics.java

Propchange: hadoop/common/branches/branch-0.20/
------------------------------------------------------------------------------
--- svn:ignore (original)
+++ svn:ignore Fri Jun 26 22:50:08 2009
@@ -3,3 +3,5 @@
 .classpath
 .project
 .settings
+
+.externalToolBuilders

Modified: hadoop/common/branches/branch-0.20/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/CHANGES.txt?rev=788899&amp;r1=788898&amp;r2=788899&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/CHANGES.txt (original)
+++ hadoop/common/branches/branch-0.20/CHANGES.txt Fri Jun 26 22:50:08 2009
@@ -26,6 +26,9 @@
     HADOOP-4372. Improves the way history filenames are obtained and manipulated.
     (Amar Kamat via ddas)
 
+    HADOOP-5897. Add name-node metrics to capture java heap usage.
+    (Suresh Srinivas via shv)
+
     HDFS-438. Improve help message for space quota command. (Raghu Angadi)
 
   OPTIMIZATIONS

Modified: hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BlocksMap.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BlocksMap.java?rev=788899&amp;r1=788898&amp;r2=788899&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BlocksMap.java
(original)
+++ hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BlocksMap.java
Fri Jun 26 22:50:08 2009
@@ -290,7 +290,20 @@
     }
   }
 
-  private Map&lt;Block, BlockInfo&gt; map = new HashMap&lt;Block, BlockInfo&gt;();
+  // Used for tracking HashMap capacity growth
+  private int capacity;
+  private final float loadFactor;
+  
+  private Map&lt;BlockInfo, BlockInfo&gt; map;
+
+  BlocksMap(int initialCapacity, float loadFactor) {
+    this.capacity = 1;
+    // Capacity is initialized to the next multiple of 2 of initialCapacity
+    while (this.capacity &lt; initialCapacity)
+      this.capacity &lt;&lt;= 1;
+    this.loadFactor = loadFactor;
+    this.map = new HashMap&lt;BlockInfo, BlockInfo&gt;(initialCapacity, loadFactor);
+  }
 
   /**
    * Add BlockInfo if mapping does not exist.
@@ -421,4 +434,18 @@
     
     return true;
   }
+  
+  /** Get the capacity of the HashMap that stores blocks */
+  public int getCapacity() {
+    // Capacity doubles every time the map size reaches the threshold
+    while (map.size() &gt; (int)(capacity * loadFactor)) {
+      capacity &lt;&lt;= 1;
+    }
+    return capacity;
+  }
+  
+  /** Get the load factor of the map */
+  public float getLoadFactor() {
+    return loadFactor;
+  }
 }

Modified: hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/CorruptReplicasMap.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/CorruptReplicasMap.java?rev=788899&amp;r1=788898&amp;r2=788899&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/CorruptReplicasMap.java
(original)
+++ hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/CorruptReplicasMap.java
Fri Jun 26 22:50:08 2009
@@ -61,10 +61,6 @@
                                    "on " + dn.getName() +
                                    " by " + Server.getRemoteIp());
     }
-    if (NameNode.getNameNodeMetrics() != null) {
-      NameNode.getNameNodeMetrics().numBlocksCorrupted.set(
-        corruptReplicasMap.size());
-    }
   }
 
   /**
@@ -75,10 +71,6 @@
   void removeFromCorruptReplicasMap(Block blk) {
     if (corruptReplicasMap != null) {
       corruptReplicasMap.remove(blk);
-      if (NameNode.getNameNodeMetrics() != null) {
-        NameNode.getNameNodeMetrics().numBlocksCorrupted.set(
-          corruptReplicasMap.size());
-      }
     }
   }
 

Modified: hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java?rev=788899&amp;r1=788898&amp;r2=788899&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
(original)
+++ hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
Fri Jun 26 22:50:08 2009
@@ -117,6 +117,10 @@
   public static final Log auditLog = LogFactory.getLog(
       FSNamesystem.class.getName() + ".audit");
 
+  // Default initial capacity and load factor of map
+  public static final int DEFAULT_INITIAL_MAP_CAPACITY = 16;
+  public static final float DEFAULT_MAP_LOAD_FACTOR = 0.75f;
+
   private boolean isPermissionEnabled;
   private UserGroupInformation fsOwner;
   private String supergroup;
@@ -125,9 +129,13 @@
   private FSNamesystemMetrics myFSMetrics;
   private long capacityTotal = 0L, capacityUsed = 0L, capacityRemaining = 0L;
   private int totalLoad = 0;
-  private long pendingReplicationBlocksCount = 0L, corruptReplicaBlocksCount,
-    underReplicatedBlocksCount = 0L, scheduledReplicationBlocksCount = 0L;
 
+  volatile long pendingReplicationBlocksCount = 0L;
+  volatile long corruptReplicaBlocksCount = 0L;
+  volatile long underReplicatedBlocksCount = 0L;
+  volatile long scheduledReplicationBlocksCount = 0L;
+  volatile long excessBlocksCount = 0L;
+  volatile long pendingDeletionBlocksCount = 0L;
   //
   // Stores the correct file name hierarchy
   //
@@ -137,7 +145,8 @@
   // Mapping: Block -&gt; { INode, datanodes, self ref } 
   // Updated only in response to client-sent information.
   //
-  BlocksMap blocksMap = new BlocksMap();
+  final BlocksMap blocksMap = new BlocksMap(DEFAULT_INITIAL_MAP_CAPACITY, 
+                                            DEFAULT_MAP_LOAD_FACTOR);
 
   //
   // Store blocks--&gt;datanodedescriptor(s) map of corrupt replicas
@@ -1181,7 +1190,9 @@
           // This reduces the possibility of triggering HADOOP-1349.
           //
           for(Collection&lt;Block&gt; v : recentInvalidateSets.values()) {
-            v.remove(last);
+            if (v.remove(last)) {
+              pendingDeletionBlocksCount--;
+            }
           }
         }
       }
@@ -1461,8 +1472,11 @@
    * Remove a datanode from the invalidatesSet
    * @param n datanode
    */
-  private void removeFromInvalidates(DatanodeInfo n) {
-    recentInvalidateSets.remove(n.getStorageID());
+  void removeFromInvalidates(String storageID) {
+    Collection&lt;Block&gt; blocks = recentInvalidateSets.remove(storageID);
+    if (blocks != null) {
+      pendingDeletionBlocksCount -= blocks.size();
+    }
   }
 
   /**
@@ -1489,7 +1503,9 @@
       invalidateSet = new HashSet&lt;Block&gt;();
       recentInvalidateSets.put(n.getStorageID(), invalidateSet);
     }
-    invalidateSet.add(b);
+    if (invalidateSet.add(b)) {
+      pendingDeletionBlocksCount++;
+    }
   }
   
   /**
@@ -1509,7 +1525,8 @@
    */
   private synchronized void dumpRecentInvalidateSets(PrintWriter out) {
     int size = recentInvalidateSets.values().size();
-    out.println("Metasave: Blocks waiting deletion from "+size+" datanodes.");
+    out.println("Metasave: Blocks " + pendingDeletionBlocksCount 
+        + " waiting deletion from " + size + " datanodes.");
     if (size == 0) {
       return;
     }
@@ -2658,9 +2675,13 @@
     String firstNodeId = recentInvalidateSets.keySet().iterator().next();
     assert firstNodeId != null;
     DatanodeDescriptor dn = datanodeMap.get(firstNodeId);
-    Collection&lt;Block&gt; invalidateSet = recentInvalidateSets.remove(firstNodeId);
- 
-    if(invalidateSet == null || dn == null)
+    if (dn == null) {
+       removeFromInvalidates(firstNodeId);
+       return 0;
+    }
+
+    Collection&lt;Block&gt; invalidateSet = recentInvalidateSets.get(firstNodeId);
+    if(invalidateSet == null)
       return 0;
 
     ArrayList&lt;Block&gt; blocksToInvalidate = 
@@ -2674,10 +2695,10 @@
       it.remove();
     }
 
-    // If we could not send everything in this message, reinsert this item
-    // into the collection.
-    if(it.hasNext())
-      recentInvalidateSets.put(firstNodeId, invalidateSet);
+    // If we send everything in this message, remove this node entry
+    if (!it.hasNext()) {
+      removeFromInvalidates(firstNodeId);
+    }
 
     dn.addBlocksToBeInvalidated(blocksToInvalidate);
 
@@ -2756,7 +2777,7 @@
 
   void unprotectedRemoveDatanode(DatanodeDescriptor nodeDescr) {
     nodeDescr.resetBlocks();
-    removeFromInvalidates(nodeDescr);
+    removeFromInvalidates(nodeDescr.getStorageID());
     NameNode.stateChangeLog.debug(
                                   "BLOCK* NameSystem.unprotectedRemoveDatanode: "
                                   + nodeDescr.getName() + " is out of service now.");
@@ -3265,9 +3286,12 @@
         excessBlocks = new TreeSet&lt;Block&gt;();
         excessReplicateMap.put(cur.getStorageID(), excessBlocks);
       }
-      excessBlocks.add(b);
-      NameNode.stateChangeLog.debug("BLOCK* NameSystem.chooseExcessReplicates: "
-                                    +"("+cur.getName()+", "+b+") is added to excessReplicateMap");
+      if (excessBlocks.add(b)) {
+        excessBlocksCount++;
+        NameNode.stateChangeLog.debug("BLOCK* NameSystem.chooseExcessReplicates: "
+                                      +"("+cur.getName()+", "+b
+                                      +") is added to excessReplicateMap");
+      }
 
       //
       // The 'excessblocks' tracks blocks until we get confirmation
@@ -3315,11 +3339,13 @@
     //
     Collection&lt;Block&gt; excessBlocks = excessReplicateMap.get(node.getStorageID());
     if (excessBlocks != null) {
-      excessBlocks.remove(block);
-      NameNode.stateChangeLog.debug("BLOCK* NameSystem.removeStoredBlock: "
-                                    +block+" is removed from excessBlocks");
-      if (excessBlocks.size() == 0) {
-        excessReplicateMap.remove(node.getStorageID());
+      if (excessBlocks.remove(block)) {
+        excessBlocksCount--;
+        NameNode.stateChangeLog.debug("BLOCK* NameSystem.removeStoredBlock: "
+            + block + " is removed from excessBlocks");
+        if (excessBlocks.size() == 0) {
+          excessReplicateMap.remove(node.getStorageID());
+        }
       }
     }
     
@@ -4229,11 +4255,7 @@
       if (blockTotal == -1 &amp;&amp; blockSafe == -1) {
         return true; // manual safe mode
       }
-      int activeBlocks = blocksMap.size();
-      for(Iterator&lt;Collection&lt;Block&gt;&gt; it = 
-            recentInvalidateSets.values().iterator(); it.hasNext();) {
-        activeBlocks -= it.next().size();
-      }
+      int activeBlocks = blocksMap.size() - (int)pendingDeletionBlocksCount;
       return (blockTotal == activeBlocks) ||
         (blockSafe &gt;= 0 &amp;&amp; blockSafe &lt;= blockTotal);
     }
@@ -4521,7 +4543,7 @@
   }
 
   /** Returns number of blocks with corrupt replicas */
-  public long getCorruptReplicaBlocksCount() {
+  public long getCorruptReplicaBlocks() {
     return corruptReplicaBlocksCount;
   }
 
@@ -4529,6 +4551,18 @@
     return scheduledReplicationBlocksCount;
   }
 
+  public long getPendingDeletionBlocks() {
+    return pendingDeletionBlocksCount;
+  }
+
+  public long getExcessBlocks() {
+    return excessBlocksCount;
+  }
+  
+  public synchronized int getBlockCapacity() {
+    return blocksMap.getCapacity();
+  }
+
   public String getFSState() {
     return isInSafeMode() ? "safeMode" : "Operational";
   }

Modified: hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/FSNamesystemMetrics.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/FSNamesystemMetrics.java?rev=788899&amp;r1=788898&amp;r2=788899&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/FSNamesystemMetrics.java
(original)
+++ hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/FSNamesystemMetrics.java
Fri Jun 26 22:50:08 2009
@@ -42,20 +42,24 @@
  */
 public class FSNamesystemMetrics implements Updater {
   private static Log log = LogFactory.getLog(FSNamesystemMetrics.class);
-  private final MetricsRecord metricsRecord;
+  final MetricsRecord metricsRecord;
   public MetricsRegistry registry = new MetricsRegistry();
 
+  final MetricsIntValue filesTotal = new MetricsIntValue("FilesTotal", registry);
+  final MetricsLongValue blocksTotal = new MetricsLongValue("BlocksTotal", registry);
+  final MetricsIntValue capacityTotalGB = new MetricsIntValue("CapacityTotalGB", registry);
+  final MetricsIntValue capacityUsedGB = new MetricsIntValue("CapacityUsedGB", registry);
+  final MetricsIntValue capacityRemainingGB = new MetricsIntValue("CapacityRemainingGB",
registry);
+  final MetricsIntValue totalLoad = new MetricsIntValue("TotalLoad", registry);
+  final MetricsIntValue pendingDeletionBlocks = new MetricsIntValue("PendingDeletionBlocks",
registry);
+  final MetricsIntValue corruptBlocks = new MetricsIntValue("CorruptBlocks", registry);
+  final MetricsIntValue excessBlocks = new MetricsIntValue("ExcessBlocks", registry);
+  final MetricsIntValue pendingReplicationBlocks = new MetricsIntValue("PendingReplicationBlocks",
registry);
+  final MetricsIntValue underReplicatedBlocks = new MetricsIntValue("UnderReplicatedBlocks",
registry);
+  final MetricsIntValue scheduledReplicationBlocks = new MetricsIntValue("ScheduledReplicationBlocks",
registry);
+  final MetricsIntValue missingBlocks = new MetricsIntValue("MissingBlocks", registry); 
  
+  final MetricsIntValue blockCapacity = new MetricsIntValue("BlockCapacity", registry);
    
-  public MetricsIntValue filesTotal = new MetricsIntValue("FilesTotal", registry);
-  public MetricsLongValue blocksTotal = new MetricsLongValue("BlocksTotal", registry);
-  public MetricsIntValue capacityTotalGB = new MetricsIntValue("CapacityTotalGB", registry);
-  public MetricsIntValue capacityUsedGB = new MetricsIntValue("CapacityUsedGB", registry);
-  public MetricsIntValue capacityRemainingGB = new MetricsIntValue("CapacityRemainingGB",
registry);
-  public MetricsIntValue totalLoad = new MetricsIntValue("TotalLoad", registry);
-  public MetricsIntValue pendingReplicationBlocks = new MetricsIntValue("PendingReplicationBlocks",
registry);
-  public MetricsIntValue underReplicatedBlocks = new MetricsIntValue("UnderReplicatedBlocks",
registry);
-  public MetricsIntValue scheduledReplicationBlocks = new MetricsIntValue("ScheduledReplicationBlocks",
registry);
-  public MetricsIntValue missingBlocks = new MetricsIntValue("MissingBlocks", registry);
   
   public FSNamesystemMetrics(Configuration conf) {
     String sessionId = conf.get("session.id");
      
@@ -100,12 +104,16 @@
       capacityRemainingGB.set(roundBytesToGBytes(fsNameSystem.
                                                getCapacityRemaining()));
       totalLoad.set(fsNameSystem.getTotalLoad());
+      corruptBlocks.set((int)fsNameSystem.getCorruptReplicaBlocks());
+      excessBlocks.set((int)fsNameSystem.getExcessBlocks());
+      pendingDeletionBlocks.set((int)fsNameSystem.getPendingDeletionBlocks());
       pendingReplicationBlocks.set((int)fsNameSystem.
                                    getPendingReplicationBlocks());
       underReplicatedBlocks.set((int)fsNameSystem.getUnderReplicatedBlocks());
       scheduledReplicationBlocks.set((int)fsNameSystem.
                                       getScheduledReplicationBlocks());
       missingBlocks.set((int)fsNameSystem.getMissingBlocksCount());
+      blockCapacity.set(fsNameSystem.getBlockCapacity());
 
       for (MetricsBase m : registry.getMetricsList()) {
         m.pushMetric(metricsRecord);

Added: hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java?rev=788899&amp;view=auto
==============================================================================
--- hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
(added)
+++ hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
Fri Jun 26 22:50:08 2009
@@ -0,0 +1,151 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.namenode.metrics;
+
+import java.io.IOException;
+import java.util.Random;
+
+import junit.framework.TestCase;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DFSTestUtil;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hdfs.protocol.LocatedBlock;
+import org.apache.hadoop.hdfs.server.namenode.FSNamesystem;
+
+/**
+ * Test for metrics published by the Namenode
+ */
+public class TestNameNodeMetrics extends TestCase {
+  private static final Configuration CONF = new Configuration();
+  static {
+    CONF.setLong("dfs.block.size", 100);
+    CONF.setInt("io.bytes.per.checksum", 1);
+    CONF.setLong("dfs.heartbeat.interval", 1L);
+    CONF.setInt("dfs.replication.interval", 1);
+  }
+  
+  private MiniDFSCluster cluster;
+  private FSNamesystemMetrics metrics;
+  private DistributedFileSystem fs;
+  private Random rand = new Random();
+  private FSNamesystem namesystem;
+
+  @Override
+  protected void setUp() throws Exception {
+    cluster = new MiniDFSCluster(CONF, 3, true, null);
+    cluster.waitActive();
+    namesystem = cluster.getNameNode().getNamesystem();
+    fs = (DistributedFileSystem) cluster.getFileSystem();
+    metrics = namesystem.getFSNamesystemMetrics();
+  }
+  
+  @Override
+  protected void tearDown() throws Exception {
+    cluster.shutdown();
+  }
+  
+  /** create a file with a length of &lt;code&gt;fileLen&lt;/code&gt; */
+  private void createFile(String fileName, long fileLen, short replicas) throws IOException
{
+    Path filePath = new Path(fileName);
+    DFSTestUtil.createFile(fs, filePath, fileLen, replicas, rand.nextLong());
+  }
+
+  private void updateMetrics() throws Exception {
+    // Wait for metrics update (corresponds to dfs.replication.interval
+    // for some block related metrics to get updated)
+    Thread.sleep(1000);
+    metrics.doUpdates(null);
+  }
+
+  /** Test metrics associated with addition of a file */
+  public void testFileAdd() throws Exception {
+    // Add files with 100 blocks
+    final String file = "/tmp/t";
+    createFile(file, 3200, (short)3);
+    final int blockCount = 32;
+    int blockCapacity = namesystem.getBlockCapacity();
+    updateMetrics();
+    assertEquals(blockCapacity, metrics.blockCapacity.get());
+
+    // Blocks are stored in a hashmap. Compute its capacity, which
+    // doubles every time the number of entries reach the threshold.
+    int threshold = (int)(blockCapacity * FSNamesystem.DEFAULT_MAP_LOAD_FACTOR);
+    while (threshold &lt; blockCount) {
+      blockCapacity &lt;&lt;= 1;
+    }
+    updateMetrics();
+    assertEquals(3, metrics.filesTotal.get());
+    assertEquals(blockCount, metrics.blocksTotal.get());
+    assertEquals(blockCapacity, metrics.blockCapacity.get());
+    fs.delete(new Path(file), true);
+  }
+  
+  /** Corrupt a block and ensure metrics reflects it */
+  public void testCorruptBlock() throws Exception {
+    // Create a file with single block with two replicas
+    String file = "/tmp/t";
+    createFile(file, 100, (short)2);
+    
+    // Corrupt first replica of the block
+    LocatedBlock block = namesystem.getBlockLocations(file, 0, 1).get(0);
+    namesystem.markBlockAsCorrupt(block.getBlock(), block.getLocations()[0]);
+    updateMetrics();
+    assertEquals(1, metrics.corruptBlocks.get());
+    assertEquals(1, metrics.pendingReplicationBlocks.get());
+    assertEquals(1, metrics.scheduledReplicationBlocks.get());
+    fs.delete(new Path(file), true);
+    updateMetrics();
+    assertEquals(0, metrics.corruptBlocks.get());
+    assertEquals(0, metrics.pendingReplicationBlocks.get());
+    assertEquals(0, metrics.scheduledReplicationBlocks.get());
+  }
+  
+  /** Create excess blocks by reducing the replication factor for
+   * for a file and ensure metrics reflects it
+   */
+  public void testExcessBlocks() throws Exception {
+    String file = "/tmp/t";
+    createFile(file, 100, (short)2);
+    int totalBlocks = 1;
+    namesystem.setReplication(file, (short)1);
+    updateMetrics();
+    assertEquals(totalBlocks, metrics.excessBlocks.get());
+    assertEquals(totalBlocks, metrics.pendingDeletionBlocks.get());
+    fs.delete(new Path(file), true);
+  }
+  
+  /** Test to ensure metrics reflects missing blocks */
+  public void testMissingBlock() throws Exception {
+    // Create a file with single block with two replicas
+    String file = "/tmp/t";
+    createFile(file, 100, (short)1);
+    
+    // Corrupt the only replica of the block to result in a missing block
+    LocatedBlock block = namesystem.getBlockLocations(file, 0, 1).get(0);
+    namesystem.markBlockAsCorrupt(block.getBlock(), block.getLocations()[0]);
+    updateMetrics();
+    assertEquals(1, metrics.underReplicatedBlocks.get());
+    assertEquals(1, metrics.missingBlocks.get());
+    fs.delete(new Path(file), true);
+    updateMetrics();
+    assertEquals(0, metrics.underReplicatedBlocks.get());
+  }
+}

Propchange: hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
------------------------------------------------------------------------------
    svn:mime-type = text/plain




</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Trivial Update of &quot;Hbase/PoweredBy&quot; by DaveLatham</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626175032.1288.54502@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090626175032-1288-54502@eos-apache-org%3e</id>
<updated>2009-06-26T17:50:32Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by DaveLatham:
http://wiki.apache.org/hadoop/Hbase/PoweredBy

------------------------------------------------------------------------------
  [http://www.adobe.com Adobe] - We currently have about 30 nodes running HDFS, Hadoop and
HBase  in clusters ranging from 5 to 14 nodes on both production and development. We plan
a deployment on an 80 nodes cluster. We are using HBase in several areas from social services
to structured data and processing for internal use. We constantly write data to HBase and
run mapreduce jobs to process then store it back to HBase or external systems. Our production
cluster has been running since Oct 2008.
  
- [http://www.flurry.com Flurry] provides mobile application analytics.  We use HBase and
Hadoop of all of our analytics processing, and serve all of our live requests directly out
of HBase in our production cluster with billions of rows over several tables.
+ [http://www.flurry.com Flurry] provides mobile application analytics.  We use HBase and
Hadoop for all of our analytics processing, and serve all of our live requests directly out
of HBase on our production cluster with billions of rows over several tables.
  
  [http://www.mahalo.com Mahalo], "...the world's first human-powered search engine". All
the markup that powers the wiki is stored in HBase. It's been in use for a few months now.
!MediaWiki - the same software that power Wikipedia - has version/revision control. Mahalo's
in-house editors produce a lot of revisions per day, which was not working well in a RDBMS.
An hbase-based solution for this was built and tested, and the data migrated out of MySQL
and into HBase. Right now it's at something like 6 million items in HBase. The upload tool
runs every hour from a shell script to back up that data, and on 6 nodes takes about 5-10
minutes to run - and does not slow down production at all. 
  


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;Hbase/PoweredBy&quot; by DaveLatham</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626174940.995.75243@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090626174940-995-75243@eos-apache-org%3e</id>
<updated>2009-06-26T17:49:40Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by DaveLatham:
http://wiki.apache.org/hadoop/Hbase/PoweredBy

The comment on the change is:
added Flurry, moved OpenPlaces to alphabetical order

------------------------------------------------------------------------------
  [http://www.adobe.com Adobe] - We currently have about 30 nodes running HDFS, Hadoop and
HBase  in clusters ranging from 5 to 14 nodes on both production and development. We plan
a deployment on an 80 nodes cluster. We are using HBase in several areas from social services
to structured data and processing for internal use. We constantly write data to HBase and
run mapreduce jobs to process then store it back to HBase or external systems. Our production
cluster has been running since Oct 2008.
+ 
+ [http://www.flurry.com Flurry] provides mobile application analytics.  We use HBase and
Hadoop of all of our analytics processing, and serve all of our live requests directly out
of HBase in our production cluster with billions of rows over several tables.
  
  [http://www.mahalo.com Mahalo], "...the world's first human-powered search engine". All
the markup that powers the wiki is stored in HBase. It's been in use for a few months now.
!MediaWiki - the same software that power Wikipedia - has version/revision control. Mahalo's
in-house editors produce a lot of revisions per day, which was not working well in a RDBMS.
An hbase-based solution for this was built and tested, and the data migrated out of MySQL
and into HBase. Right now it's at something like 6 million items in HBase. The upload tool
runs every hour from a shell script to back up that data, and on 6 nodes takes about 5-10
minutes to run - and does not slow down production at all. 
  
+ [http://www.openplaces.org Openplaces] is a search engine for travel that uses HBase to
store terabytes of web pages and travel-related entity records (countries, cities, hotels,
etc.). We have dozens of MapReduce jobs that crunch data on a daily basis.  We use a 20-node
cluster for development, a 40-node cluster for offline production processing and an EC2 cluster
for the live web site. 
  [http://www.powerset.com/ Powerset (a Microsoft company)] uses HBase to store raw documents.
 We have a ~110 node hadoop cluster running DFS, mapreduce, and hbase.  In our wikipedia hbase
table, we have one row for each wikipedia page (~2.5M pages and climbing).  We use this as
input to our indexing jobs, which are run in hadoop mapreduce.  Uploading the entire wikipedia
dump to our cluster takes a couple hours.  Scanning the table inside mapreduce is very fast
-- the latency is in the noise compared to everything else we do.
  
  [http://www.streamy.com/ Streamy] is a recently launched realtime social news site.  We
use HBase for all of our data storage, query, and analysis needs, replacing an existing SQL-based
system.  This includes hundreds of millions of documents, sparse matrices, logs, and everything
else once done in the relational system.  We perform significant in-memory caching of query
results similar to a traditional Memcached/SQL setup as well as other external components
to perform joining and sorting.  We also run thousands of daily MapReduce jobs using HBase
tables for log analysis, attention data processing, and feed crawling.  HBase has helped us
scale and distribute in ways we could not otherwise, and the community has provided consistent
and invaluable assistance.
@@ -22, +25 @@

  
  [http://www.yahoo.com/ Yahoo!] uses HBase to store document fingerprint for detecting near-duplications.
We have a cluster of few nodes that runs HDFS, mapreduce, and HBase. The table contains millions
of rows. We use this for querying duplicated documents with realtime traffic.
  
- [http://www.openplaces.org Openplaces] is a search engine for travel that uses HBase to
store terabytes of web pages and travel-related entity records (countries, cities, hotels,
etc.). We have dozens of MapReduce jobs that crunch data on a daily basis.  We use a 20-node
cluster for development, a 40-node cluster for offline production processing and an EC2 cluster
for the live web site. 
- 


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;ZooKeeper/PoweredBy&quot; by StuHood</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626154401.17529.69872@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090626154401-17529-69872@eos-apache-org%3e</id>
<updated>2009-06-26T15:44:01Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by StuHood:
http://wiki.apache.org/hadoop/ZooKeeper/PoweredBy

------------------------------------------------------------------------------
   * [http://hadoop.apache.org/hbase/ HBase] - HBase is the Hadoop database. Its an open-source,
distributed, column-oriented store modeled after the Google paper, Bigtable: A Distributed
Storage System for Structured Data by Chang et al.
     We use ZooKeeper for master election, server lease management, bootstrapping, and coordination
between servers.
  
-  * [http://www.rackspace.com/email_hosting Rackspace] - The Email &amp; Apps team uses ZooKeeper
to coordinate sharding and responsible changes in a distributed e-mail client that pulls and
indexes data for search. ZooKeeper also provides distributed locking for connections to prevent
a cluster from overwhelming servers.
+  * [http://www.rackspace.com/email_hosting Rackspace] - The Email &amp; Apps team uses ZooKeeper
to coordinate sharding and responsibility changes in a distributed e-mail client that pulls
and indexes data for search. ZooKeeper also provides distributed locking for connections to
prevent a cluster from overwhelming servers.
  
   * [http://www.yahoo.com/ Yahoo!] - ZooKeeper is used for a myriad of services inside Yahoo!
for doing leader election, configuration management, sharding, locking, group membership etc.
  


</pre>
</div>
</content>
</entry>
<entry>
<title>svn commit: r788726 - /hadoop/common/nightly/hudsonPatchQueueAdmin.sh</title>
<author><name>gkesavan@apache.org</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626145806.B016F23888CD@eris.apache.org%3e"/>
<id>urn:uuid:%3c20090626145806-B016F23888CD@eris-apache-org%3e</id>
<updated>2009-06-26T14:58:06Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Author: gkesavan
Date: Fri Jun 26 14:58:06 2009
New Revision: 788726

URL: http://svn.apache.org/viewvc?rev=788726&amp;view=rev
Log:
HADOOP-6112. Fix hudsonPatchQueueAdmin for different projects.

Modified:
    hadoop/common/nightly/hudsonPatchQueueAdmin.sh

Modified: hadoop/common/nightly/hudsonPatchQueueAdmin.sh
URL: http://svn.apache.org/viewvc/hadoop/common/nightly/hudsonPatchQueueAdmin.sh?rev=788726&amp;r1=788725&amp;r2=788726&amp;view=diff
==============================================================================
--- hadoop/common/nightly/hudsonPatchQueueAdmin.sh (original)
+++ hadoop/common/nightly/hudsonPatchQueueAdmin.sh Fri Jun 26 14:58:06 2009
@@ -91,14 +91,14 @@
 
 echo "&lt;html&gt;" &gt; $QUEUE_HTML_FILE
 echo "&lt;title&gt;Patch Queue for $PROJECT&lt;/title&gt;" &gt;&gt; $QUEUE_HTML_FILE
-echo "&lt;h3 align='left'&gt;&lt;img src="http://hadoop.apache.org/core/images/hadoop-logo.jpg"
height="50"&gt;&lt;/img&gt;Patch Queue for HADOOP&lt;/h3&gt;" &gt;&gt; $QUEUE_HTML_FILE
+echo "&lt;h3 align='left'&gt;&lt;img src="http://hadoop.apache.org/core/images/hadoop-logo.jpg"
height="50"&gt;&lt;/img&gt;Patch Queue for ${PROJECT}&lt;/h3&gt;" &gt;&gt; $QUEUE_HTML_FILE
 echo "&lt;hr style='height:2px;border-width:0;color:red;background-color:blue'&gt;" &gt;&gt;
$QUEUE_HTML_FILE
 echo "&lt;h4&gt;Currently Running (or Waiting To Run)&lt;/h4&gt;" &gt;&gt; $QUEUE_HTML_FILE
 echo "&lt;table cellspacing=10&gt;&lt;tr align=left&gt;&lt;th&gt;Issue&lt;/th&gt;&lt;th&gt;Submitted
to&lt;/th&gt;&lt;th&gt;Date Submitted to Run&lt;/th&gt;&lt;/tr&gt;" &gt;&gt; $QUEUE_HTML_FILE
 
 for SLAVE in $BUILD_SERVERS 
 do 
-  TRIGGER_BUILD_URL=${HUDSON_URL}'job/Hadoop-Patch-'${SLAVE}${BUILD_URL_TOKEN}
+  TRIGGER_BUILD_URL=${HUDSON_URL}'job/'${PROJECT}'-Patch-'${SLAVE}${BUILD_URL_TOKEN}
   CURRENT_PATCH=${QUEUE_DIR}/${SLAVE}
   defect=`head -n 1 $PATCH_QUEUE | awk '{print $1}'`
 if [[ ! -f $CURRENT_PATCH ]]; then




</pre>
</div>
</content>
</entry>
<entry>
<title>svn commit: r788699 - in /hadoop/common/trunk: ./ src/contrib/ec2/bin/</title>
<author><name>tomwhite@apache.org</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626134008.1AC382388893@eris.apache.org%3e"/>
<id>urn:uuid:%3c20090626134008-1AC382388893@eris-apache-org%3e</id>
<updated>2009-06-26T13:40:07Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Author: tomwhite
Date: Fri Jun 26 13:40:07 2009
New Revision: 788699

URL: http://svn.apache.org/viewvc?rev=788699&amp;view=rev
Log:
HADOOP-5925. EC2 scripts should exit on error.

Modified:
    hadoop/common/trunk/CHANGES.txt
    hadoop/common/trunk/src/contrib/ec2/bin/cmd-hadoop-cluster
    hadoop/common/trunk/src/contrib/ec2/bin/create-hadoop-image
    hadoop/common/trunk/src/contrib/ec2/bin/delete-hadoop-cluster
    hadoop/common/trunk/src/contrib/ec2/bin/hadoop-ec2
    hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-cluster
    hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-master
    hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-slaves
    hadoop/common/trunk/src/contrib/ec2/bin/list-hadoop-clusters
    hadoop/common/trunk/src/contrib/ec2/bin/terminate-hadoop-cluster

Modified: hadoop/common/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/CHANGES.txt?rev=788699&amp;r1=788698&amp;r2=788699&amp;view=diff
==============================================================================
--- hadoop/common/trunk/CHANGES.txt (original)
+++ hadoop/common/trunk/CHANGES.txt Fri Jun 26 13:40:07 2009
@@ -465,6 +465,8 @@
     commands that do not complete within a certain amount of time.
     (Sreekanth Ramakrishnan via yhemanth)
 
+    HADOOP-5925. EC2 scripts should exit on error. (tomwhite)
+
   OPTIMIZATIONS
 
     HADOOP-5595. NameNode does not need to run a replicator to choose a

Modified: hadoop/common/trunk/src/contrib/ec2/bin/cmd-hadoop-cluster
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/contrib/ec2/bin/cmd-hadoop-cluster?rev=788699&amp;r1=788698&amp;r2=788699&amp;view=diff
==============================================================================
--- hadoop/common/trunk/src/contrib/ec2/bin/cmd-hadoop-cluster (original)
+++ hadoop/common/trunk/src/contrib/ec2/bin/cmd-hadoop-cluster Fri Jun 26 13:40:07 2009
@@ -17,6 +17,8 @@
 
 # Run commands on master or specified node of a running Hadoop EC2 cluster.
 
+set -o errexit
+
 # if no args specified, show usage
 if [ $# = 0 ]; then
   echo "Command required!"

Modified: hadoop/common/trunk/src/contrib/ec2/bin/create-hadoop-image
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/contrib/ec2/bin/create-hadoop-image?rev=788699&amp;r1=788698&amp;r2=788699&amp;view=diff
==============================================================================
--- hadoop/common/trunk/src/contrib/ec2/bin/create-hadoop-image (original)
+++ hadoop/common/trunk/src/contrib/ec2/bin/create-hadoop-image Fri Jun 26 13:40:07 2009
@@ -18,6 +18,8 @@
 # Create a Hadoop AMI.
 # Inspired by Jonathan Siegel's EC2 script (http://blogsiegel.blogspot.com/2006/08/sandboxing-amazon-ec2.html)
 
+set -o errexit
+
 # Import variables
 bin=`dirname "$0"`
 bin=`cd "$bin"; pwd`

Modified: hadoop/common/trunk/src/contrib/ec2/bin/delete-hadoop-cluster
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/contrib/ec2/bin/delete-hadoop-cluster?rev=788699&amp;r1=788698&amp;r2=788699&amp;view=diff
==============================================================================
--- hadoop/common/trunk/src/contrib/ec2/bin/delete-hadoop-cluster (original)
+++ hadoop/common/trunk/src/contrib/ec2/bin/delete-hadoop-cluster Fri Jun 26 13:40:07 2009
@@ -15,7 +15,9 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-# Delete the groups an local files associated with a cluster.
+# Delete the groups and local files associated with a cluster.
+
+set -o errexit
 
 if [ -z $1 ]; then
   echo "Cluster name required!"
@@ -42,17 +44,17 @@
 rm -f $MASTER_IP_PATH
 rm -f $MASTER_PRIVATE_IP_PATH
 
-ec2-describe-group | egrep "[[:space:]]$CLUSTER_MASTER[[:space:]]" &gt; /dev/null
-if [ $? -eq 0 ]; then
+if ec2-describe-group $CLUSTER_MASTER &gt; /dev/null 2&gt;&amp;1; then
+  if ec2-describe-group $CLUSTER &gt; /dev/null 2&gt;&amp;1; then
+    echo "Revoking authorization between $CLUSTER_MASTER and $CLUSTER"
+    ec2-revoke $CLUSTER_MASTER -o $CLUSTER -u $AWS_ACCOUNT_ID || true
+    ec2-revoke $CLUSTER -o $CLUSTER_MASTER -u $AWS_ACCOUNT_ID || true
+  fi
   echo "Deleting group $CLUSTER_MASTER"
-  ec2-revoke $CLUSTER_MASTER -o $CLUSTER -u $AWS_ACCOUNT_ID
+  ec2-delete-group $CLUSTER_MASTER
 fi
 
-ec2-describe-group | egrep "[[:space:]]$CLUSTER[[:space:]]" &gt; /dev/null
-if [ $? -eq 0 ]; then
+if ec2-describe-group $CLUSTER &gt; /dev/null 2&gt;&amp;1; then
   echo "Deleting group $CLUSTER"
-  ec2-revoke $CLUSTER -o $CLUSTER_MASTER -u $AWS_ACCOUNT_ID
+  ec2-delete-group $CLUSTER
 fi
-
-ec2-delete-group $CLUSTER_MASTER
-ec2-delete-group $CLUSTER

Modified: hadoop/common/trunk/src/contrib/ec2/bin/hadoop-ec2
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/contrib/ec2/bin/hadoop-ec2?rev=788699&amp;r1=788698&amp;r2=788699&amp;view=diff
==============================================================================
--- hadoop/common/trunk/src/contrib/ec2/bin/hadoop-ec2 (original)
+++ hadoop/common/trunk/src/contrib/ec2/bin/hadoop-ec2 Fri Jun 26 13:40:07 2009
@@ -15,6 +15,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+set -o errexit
+
 bin=`dirname "$0"`
 bin=`cd "$bin"; pwd`
 

Modified: hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-cluster
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-cluster?rev=788699&amp;r1=788698&amp;r2=788699&amp;view=diff
==============================================================================
--- hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-cluster (original)
+++ hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-cluster Fri Jun 26 13:40:07 2009
@@ -17,6 +17,8 @@
 
 # Launch an EC2 cluster of Hadoop instances.
 
+set -o errexit
+
 # Import variables
 bin=`dirname "$0"`
 bin=`cd "$bin"; pwd`

Modified: hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-master
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-master?rev=788699&amp;r1=788698&amp;r2=788699&amp;view=diff
==============================================================================
--- hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-master (original)
+++ hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-master Fri Jun 26 13:40:07 2009
@@ -17,6 +17,8 @@
 
 # Launch an EC2 Hadoop master.
 
+set -o errexit
+
 if [ -z $1 ]; then
   echo "Cluster name required!"
   exit -1
@@ -46,8 +48,7 @@
   exit 0
 fi
 
-ec2-describe-group | egrep "[[:space:]]$CLUSTER_MASTER[[:space:]]" &gt; /dev/null
-if [ ! $? -eq 0 ]; then
+if ! ec2-describe-group $CLUSTER_MASTER &gt; /dev/null 2&gt;&amp;1; then
   echo "Creating group $CLUSTER_MASTER"
   ec2-add-group $CLUSTER_MASTER -d "Group for Hadoop Master."
   ec2-authorize $CLUSTER_MASTER -o $CLUSTER_MASTER -u $AWS_ACCOUNT_ID
@@ -61,8 +62,7 @@
   fi
 fi
 
-ec2-describe-group | egrep "[[:space:]]$CLUSTER[[:space:]]" &gt; /dev/null
-if [ ! $? -eq 0 ]; then
+if ! ec2-describe-group $CLUSTER &gt; /dev/null 2&gt;&amp;1; then
   echo "Creating group $CLUSTER"
   ec2-add-group $CLUSTER -d "Group for Hadoop Slaves."
   ec2-authorize $CLUSTER -o $CLUSTER -u $AWS_ACCOUNT_ID
@@ -105,8 +105,7 @@
 echo $MASTER_EC2_ZONE &gt; $MASTER_ZONE_PATH
 
 while true; do
-  REPLY=`ssh $SSH_OPTS "root@$MASTER_EC2_HOST" 'echo "hello"'`
-  if [ ! -z $REPLY ]; then
+  if ssh $SSH_OPTS "root@$MASTER_EC2_HOST" 'echo "hello"' &gt; /dev/null 2&gt;&amp;1; then
    break;
   fi
   sleep 5

Modified: hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-slaves
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-slaves?rev=788699&amp;r1=788698&amp;r2=788699&amp;view=diff
==============================================================================
--- hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-slaves (original)
+++ hadoop/common/trunk/src/contrib/ec2/bin/launch-hadoop-slaves Fri Jun 26 13:40:07 2009
@@ -17,6 +17,8 @@
 
 # Launch an EC2 Hadoop slaves.
 
+set -o errexit
+
 if [ -z $1 ]; then
   echo "Cluster name required!"
   exit -1

Modified: hadoop/common/trunk/src/contrib/ec2/bin/list-hadoop-clusters
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/contrib/ec2/bin/list-hadoop-clusters?rev=788699&amp;r1=788698&amp;r2=788699&amp;view=diff
==============================================================================
--- hadoop/common/trunk/src/contrib/ec2/bin/list-hadoop-clusters (original)
+++ hadoop/common/trunk/src/contrib/ec2/bin/list-hadoop-clusters Fri Jun 26 13:40:07 2009
@@ -17,6 +17,8 @@
 
 # List running clusters.
 
+set -o errexit
+
 # Import variables
 bin=`dirname "$0"`
 bin=`cd "$bin"; pwd`

Modified: hadoop/common/trunk/src/contrib/ec2/bin/terminate-hadoop-cluster
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/contrib/ec2/bin/terminate-hadoop-cluster?rev=788699&amp;r1=788698&amp;r2=788699&amp;view=diff
==============================================================================
--- hadoop/common/trunk/src/contrib/ec2/bin/terminate-hadoop-cluster (original)
+++ hadoop/common/trunk/src/contrib/ec2/bin/terminate-hadoop-cluster Fri Jun 26 13:40:07 2009
@@ -17,6 +17,8 @@
 
 # Terminate a cluster.
 
+set -o errexit
+
 if [ -z $1 ]; then
   echo "Cluster name required!"
   exit -1




</pre>
</div>
</content>
</entry>
<entry>
<title>svn commit: r788666 - in /hadoop/common/branches/branch-0.20: CHANGES.txt src/mapred/org/apache/hadoop/mapred/CompletedJobStatusStore.java src/mapred/org/apache/hadoop/mapred/JobTracker.java src/test/org/apache/hadoop/mapred/TestJobStatusPersistency.java</title>
<author><name>sharad@apache.org</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626120411.BC9782388896@eris.apache.org%3e"/>
<id>urn:uuid:%3c20090626120411-BC9782388896@eris-apache-org%3e</id>
<updated>2009-06-26T12:04:11Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Author: sharad
Date: Fri Jun 26 12:04:11 2009
New Revision: 788666

URL: http://svn.apache.org/viewvc?rev=788666&amp;view=rev
Log:
MAPREDUCE-657. Fix hardcoded filesystem problem in CompletedJobStatusStore. Contributed by
Amar Kamat.

Modified:
    hadoop/common/branches/branch-0.20/CHANGES.txt
    hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/CompletedJobStatusStore.java
    hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobTracker.java
    hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/TestJobStatusPersistency.java

Modified: hadoop/common/branches/branch-0.20/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/CHANGES.txt?rev=788666&amp;r1=788665&amp;r2=788666&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/CHANGES.txt (original)
+++ hadoop/common/branches/branch-0.20/CHANGES.txt Fri Jun 26 12:04:11 2009
@@ -150,6 +150,9 @@
     MAPREDUCE-130. Delete the jobconf copy from the log directory of the 
     JobTracker when the job is retired. (Amar Kamat via sharad)
 
+    MAPREDUCE-657. Fix hardcoded filesystem problem in CompletedJobStatusStore.
+    (Amar Kamat via sharad)
+
 Release 0.20.0 - 2009-04-15
 
   INCOMPATIBLE CHANGES

Modified: hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/CompletedJobStatusStore.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/CompletedJobStatusStore.java?rev=788666&amp;r1=788665&amp;r2=788666&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/CompletedJobStatusStore.java
(original)
+++ hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/CompletedJobStatusStore.java
Fri Jun 26 12:04:11 2009
@@ -51,12 +51,11 @@
   private static long HOUR = 1000 * 60 * 60;
   private static long SLEEP_TIME = 1 * HOUR;
 
-  CompletedJobStatusStore(Configuration conf, FileSystem fs) throws IOException {
+  CompletedJobStatusStore(Configuration conf) throws IOException {
     active =
       conf.getBoolean("mapred.job.tracker.persist.jobstatus.active", false);
 
     if (active) {
-      this.fs = fs;
       retainTime =
         conf.getInt("mapred.job.tracker.persist.jobstatus.hours", 0) * HOUR;
 
@@ -64,6 +63,9 @@
         conf.get("mapred.job.tracker.persist.jobstatus.dir", JOB_INFO_STORE_DIR);
 
       Path path = new Path(jobInfoDir);
+      
+      // set the fs
+      this.fs = path.getFileSystem(conf);
       if (!fs.exists(path)) {
         fs.mkdirs(path);
       }
@@ -72,6 +74,10 @@
         // as retain time is zero, all stored jobstatuses are deleted.
         deleteJobStatusDirs();
       }
+      LOG.info("Completed job store activated/configured with retain-time : " 
+               + retainTime + " , job-info-dir : " + jobInfoDir);
+    } else {
+      LOG.info("Completed job store is inactive");
     }
   }
 

Modified: hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobTracker.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobTracker.java?rev=788666&amp;r1=788665&amp;r2=788666&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobTracker.java
(original)
+++ hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobTracker.java
Fri Jun 26 12:04:11 2009
@@ -1711,7 +1711,7 @@
         NetworkTopology.DEFAULT_HOST_LEVEL);
 
     //initializes the job status store
-    completedJobStatusStore = new CompletedJobStatusStore(conf,fs);
+    completedJobStatusStore = new CompletedJobStatusStore(conf);
   }
 
   private static SimpleDateFormat getDateFormat() {

Modified: hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/TestJobStatusPersistency.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/TestJobStatusPersistency.java?rev=788666&amp;r1=788665&amp;r2=788666&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/TestJobStatusPersistency.java
(original)
+++ hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/TestJobStatusPersistency.java
Fri Jun 26 12:04:11 2009
@@ -22,11 +22,16 @@
 import java.io.Writer;
 import java.util.Properties;
 
+import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 
 public class TestJobStatusPersistency extends ClusterMapReduceTestCase {
+  static final Path TEST_DIR = 
+    new Path(System.getProperty("test.build.data","/tmp"), 
+             "job-status-persistence");
+  
   private JobID runJob() throws Exception {
     OutputStream os = getFileSystem().create(new Path(getInputDir(), "text.txt"));
     Writer wr = new OutputStreamWriter(os);
@@ -103,4 +108,29 @@
     }
   }
 
+  /**
+   * Test if the completed job status is persisted to localfs.
+   */
+  public void testLocalPersistency() throws Exception {
+    FileSystem fs = FileSystem.getLocal(createJobConf());
+    
+    fs.delete(TEST_DIR, true);
+    
+    Properties config = new Properties();
+    config.setProperty("mapred.job.tracker.persist.jobstatus.active", "true");
+    config.setProperty("mapred.job.tracker.persist.jobstatus.hours", "1");
+    config.setProperty("mapred.job.tracker.persist.jobstatus.dir", 
+                       fs.makeQualified(TEST_DIR).toString());
+    stopCluster();
+    startCluster(false, config);
+    JobID jobId = runJob();
+    JobClient jc = new JobClient(createJobConf());
+    RunningJob rj = jc.getJob(jobId);
+    assertNotNull(rj);
+    
+    // check if the local fs has the data
+    Path jobInfo = new Path(TEST_DIR, rj.getID() + ".info");
+    assertTrue("Missing job info from the local fs", fs.exists(jobInfo));
+    fs.delete(TEST_DIR, true);
+  }
 }




</pre>
</div>
</content>
</entry>
<entry>
<title>svn commit: r788662 - in /hadoop/common/branches/branch-0.20: CHANGES.txt src/mapred/org/apache/hadoop/mapred/JobHistory.java src/mapred/org/apache/hadoop/mapred/JobTracker.java src/test/org/apache/hadoop/mapred/TestJobHistory.java</title>
<author><name>sharad@apache.org</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626113459.570C92388896@eris.apache.org%3e"/>
<id>urn:uuid:%3c20090626113459-570C92388896@eris-apache-org%3e</id>
<updated>2009-06-26T11:34:59Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Author: sharad
Date: Fri Jun 26 11:34:58 2009
New Revision: 788662

URL: http://svn.apache.org/viewvc?rev=788662&amp;view=rev
Log:
MAPREDUCE-130. Delete the jobconf copy from the log directory of the JobTracker when the job
is retired. Contributed by Amar Kamat.

Modified:
    hadoop/common/branches/branch-0.20/CHANGES.txt
    hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobHistory.java
    hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobTracker.java
    hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/TestJobHistory.java

Modified: hadoop/common/branches/branch-0.20/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/CHANGES.txt?rev=788662&amp;r1=788661&amp;r2=788662&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/CHANGES.txt (original)
+++ hadoop/common/branches/branch-0.20/CHANGES.txt Fri Jun 26 11:34:58 2009
@@ -147,6 +147,9 @@
     MAPREDUCE-2. Fixes a bug in KeyFieldBasedPartitioner in handling empty
     keys. (Amar Kamat via sharad)
 
+    MAPREDUCE-130. Delete the jobconf copy from the log directory of the 
+    JobTracker when the job is retired. (Amar Kamat via sharad)
+
 Release 0.20.0 - 2009-04-15
 
   INCOMPATIBLE CHANGES

Modified: hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobHistory.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobHistory.java?rev=788662&amp;r1=788661&amp;r2=788662&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobHistory.java
(original)
+++ hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobHistory.java
Fri Jun 26 11:34:58 2009
@@ -833,6 +833,19 @@
     }
 
     /**
+     * Deletes job data from the local disk.
+     * For now just deletes the localized copy of job conf
+     */
+    static void cleanupJob(JobID id) {
+      String localJobFilePath =  JobInfo.getLocalJobFilePath(id);
+      File f = new File (localJobFilePath);
+      LOG.info("Deleting localized job conf at " + f);
+      if (!f.delete()) {
+        LOG.debug("Failed to delete file " + f);
+      }
+    }
+
+    /**
      * Log job submitted event to history. Creates a new file in history 
      * for the job. if history file creation fails, it disables history 
      * for all other events. 

Modified: hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobTracker.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobTracker.java?rev=788662&amp;r1=788661&amp;r2=788662&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobTracker.java
(original)
+++ hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/JobTracker.java
Fri Jun 26 11:34:58 2009
@@ -426,6 +426,9 @@
                     LOG.info("Retired job with id: '" + 
                              job.getProfile().getJobID() + "' of user '" +
                              jobUser + "'");
+
+                    // clean up job files from the local disk
+                    JobHistory.JobInfo.cleanupJob(job.getProfile().getJobID());
                   }
                 }
               }

Modified: hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/TestJobHistory.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/TestJobHistory.java?rev=788662&amp;r1=788661&amp;r2=788662&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/TestJobHistory.java
(original)
+++ hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/TestJobHistory.java
Fri Jun 26 11:34:58 2009
@@ -778,10 +778,14 @@
   public void testJobHistoryFile() throws IOException {
     MiniMRCluster mr = null;
     try {
-      mr = new MiniMRCluster(2, "file:///", 3);
+      JobConf conf = new JobConf();
+      // keep for less time
+      conf.setLong("mapred.jobtracker.retirejob.check", 1000);
+      conf.setLong("mapred.jobtracker.retirejob.interval", 1000);
+      mr = new MiniMRCluster(2, "file:///", 3, null, null, conf);
 
       // run the TCs
-      JobConf conf = mr.createJobConf();
+      conf = mr.createJobConf();
 
       FileSystem fs = FileSystem.get(conf);
       // clean up
@@ -802,6 +806,15 @@
       validateJobHistoryFileFormat(job.getID(), conf, "SUCCESS", false);
       validateJobHistoryFileContent(mr, job, conf);
 
+      // get the job conf filename
+      String name = JobHistory.JobInfo.getLocalJobFilePath(job.getID());
+      File file = new File(name);
+
+      // check if the file get deleted
+      while (file.exists()) {
+        LOG.info("Waiting for " + file + " to be deleted");
+        UtilsForTests.waitFor(100);
+      }
     } finally {
       if (mr != null) {
         cleanupLocalFiles(mr);




</pre>
</div>
</content>
</entry>
<entry>
<title>svn commit: r788632 - in /hadoop/common/branches/branch-0.20: CHANGES.txt src/mapred/org/apache/hadoop/mapred/lib/KeyFieldBasedPartitioner.java src/test/org/apache/hadoop/mapred/lib/TestKeyFieldBasedPartitioner.java</title>
<author><name>sharad@apache.org</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626084045.93C4223888D0@eris.apache.org%3e"/>
<id>urn:uuid:%3c20090626084045-93C4223888D0@eris-apache-org%3e</id>
<updated>2009-06-26T08:40:45Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Author: sharad
Date: Fri Jun 26 08:40:45 2009
New Revision: 788632

URL: http://svn.apache.org/viewvc?rev=788632&amp;view=rev
Log:
MAPREDUCE-2. Fixes a bug in KeyFieldBasedPartitioner in handling empty keys. Contributed by
Amar Kamat.

Added:
    hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/lib/TestKeyFieldBasedPartitioner.java
Modified:
    hadoop/common/branches/branch-0.20/CHANGES.txt
    hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/lib/KeyFieldBasedPartitioner.java

Modified: hadoop/common/branches/branch-0.20/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/CHANGES.txt?rev=788632&amp;r1=788631&amp;r2=788632&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/CHANGES.txt (original)
+++ hadoop/common/branches/branch-0.20/CHANGES.txt Fri Jun 26 08:40:45 2009
@@ -144,6 +144,9 @@
     lack of quota. Allow quota to be set even if the limit is lower than
     current consumption. (Boris Shkolnik via rangadi)
 
+    MAPREDUCE-2. Fixes a bug in KeyFieldBasedPartitioner in handling empty
+    keys. (Amar Kamat via sharad)
+
 Release 0.20.0 - 2009-04-15
 
   INCOMPATIBLE CHANGES

Modified: hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/lib/KeyFieldBasedPartitioner.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/lib/KeyFieldBasedPartitioner.java?rev=788632&amp;r1=788631&amp;r2=788632&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/lib/KeyFieldBasedPartitioner.java
(original)
+++ hadoop/common/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/lib/KeyFieldBasedPartitioner.java
Fri Jun 26 08:40:45 2009
@@ -76,12 +76,21 @@
       throw new RuntimeException("The current system does not " +
           "support UTF-8 encoding!", e);
     }
+    // return 0 if the key is empty
+    if (keyBytes.length == 0) {
+      return 0;
+    }
+    
     int []lengthIndicesFirst = keyFieldHelper.getWordLengths(keyBytes, 0, 
         keyBytes.length);
     int currentHash = 0;
     for (KeyDescription keySpec : allKeySpecs) {
       int startChar = keyFieldHelper.getStartOffset(keyBytes, 0, keyBytes.length, 
           lengthIndicesFirst, keySpec);
+       // no key found! continue
+      if (startChar &lt; 0) {
+        continue;
+      }
       int endChar = keyFieldHelper.getEndOffset(keyBytes, 0, keyBytes.length, 
           lengthIndicesFirst, keySpec);
       currentHash = hashCode(keyBytes, startChar, endChar, 

Added: hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/lib/TestKeyFieldBasedPartitioner.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/lib/TestKeyFieldBasedPartitioner.java?rev=788632&amp;view=auto
==============================================================================
--- hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/lib/TestKeyFieldBasedPartitioner.java
(added)
+++ hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/mapred/lib/TestKeyFieldBasedPartitioner.java
Fri Jun 26 08:40:45 2009
@@ -0,0 +1,40 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.mapred.lib;
+
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner;
+
+import junit.framework.TestCase;
+
+public class TestKeyFieldBasedPartitioner extends TestCase {
+
+  /**
+   * Test is key-field-based partitioned works with empty key.
+   */
+  public void testEmptyKey() throws Exception {
+    KeyFieldBasedPartitioner&lt;Text, Text&gt; kfbp = 
+      new KeyFieldBasedPartitioner&lt;Text, Text&gt;();
+    JobConf conf = new JobConf();
+    conf.setInt("num.key.fields.for.partition", 10);
+    kfbp.configure(conf);
+    assertEquals("Empty key should map to 0th partition", 
+                 0, kfbp.getPartition(new Text(), new Text(), 10));
+  }
+}
\ No newline at end of file




</pre>
</div>
</content>
</entry>
<entry>
<title>svn commit: r788600 - in /hadoop/common/trunk: CHANGES.txt src/java/org/apache/hadoop/util/Shell.java src/test/core/org/apache/hadoop/util/TestShell.java</title>
<author><name>yhemanth@apache.org</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626061804.C09222388904@eris.apache.org%3e"/>
<id>urn:uuid:%3c20090626061804-C09222388904@eris-apache-org%3e</id>
<updated>2009-06-26T06:18:04Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Author: yhemanth
Date: Fri Jun 26 06:18:04 2009
New Revision: 788600

URL: http://svn.apache.org/viewvc?rev=788600&amp;view=rev
Log:
HADOOP-6106. Provides an option in ShellCommandExecutor to timeout commands that do not complete
within a certain amount of time. Contributed by Sreekanth Ramakrishnan.

Modified:
    hadoop/common/trunk/CHANGES.txt
    hadoop/common/trunk/src/java/org/apache/hadoop/util/Shell.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/util/TestShell.java

Modified: hadoop/common/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/CHANGES.txt?rev=788600&amp;r1=788599&amp;r2=788600&amp;view=diff
==============================================================================
--- hadoop/common/trunk/CHANGES.txt (original)
+++ hadoop/common/trunk/CHANGES.txt Fri Jun 26 06:18:04 2009
@@ -461,6 +461,10 @@
     HADOOP-5952. Change "-1 tests included" wording in test-patch.sh.
     (Gary Murry via szetszwo)
 
+    HADOOP-6106. Provides an option in ShellCommandExecutor to timeout 
+    commands that do not complete within a certain amount of time.
+    (Sreekanth Ramakrishnan via yhemanth)
+
   OPTIMIZATIONS
 
     HADOOP-5595. NameNode does not need to run a replicator to choose a

Modified: hadoop/common/trunk/src/java/org/apache/hadoop/util/Shell.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/util/Shell.java?rev=788600&amp;r1=788599&amp;r2=788600&amp;view=diff
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/util/Shell.java (original)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/util/Shell.java Fri Jun 26 06:18:04 2009
@@ -22,6 +22,9 @@
 import java.io.IOException;
 import java.io.InputStreamReader;
 import java.util.Map;
+import java.util.Timer;
+import java.util.TimerTask;
+import java.util.concurrent.atomic.AtomicBoolean;
 
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
@@ -55,6 +58,11 @@
     return new String[] {(WINDOWS ? "ls" : "/bin/ls"), "-ld"};
   }
 
+  /**Time after which the executing script would be timedout*/
+  protected long timeOutInterval = 0L;
+  /** If or not script timed out*/
+  private AtomicBoolean timedOut;
+
   /** 
    * Get the Unix command for setting the maximum virtual memory available
    * to a given child process. This is only relevant when we are forking a
@@ -96,6 +104,9 @@
   private File dir;
   private Process process; // sub process used to execute the command
   private int exitCode;
+
+  /**If or not script finished executing*/
+  private volatile AtomicBoolean completed;
   
   public Shell() {
     this(0L);
@@ -135,7 +146,10 @@
   /** Run a command */
   private void runCommand() throws IOException { 
     ProcessBuilder builder = new ProcessBuilder(getExecString());
-    boolean completed = false;
+    Timer timeOutTimer = null;
+    ShellTimeoutTimerTask timeoutTimerTask = null;
+    timedOut = new AtomicBoolean(false);
+    completed = new AtomicBoolean(false);
     
     if (environment != null) {
       builder.environment().putAll(this.environment);
@@ -145,6 +159,13 @@
     }
     
     process = builder.start();
+    if (timeOutInterval &gt; 0) {
+      timeOutTimer = new Timer();
+      timeoutTimerTask = new ShellTimeoutTimerTask(
+          this);
+      //One time scheduling.
+      timeOutTimer.schedule(timeoutTimerTask, timeOutInterval);
+    }
     final BufferedReader errReader = 
             new BufferedReader(new InputStreamReader(process
                                                      .getErrorStream()));
@@ -181,27 +202,32 @@
         line = inReader.readLine();
       }
       // wait for the process to finish and check the exit code
-      exitCode = process.waitFor();
+      exitCode  = process.waitFor();
       try {
         // make sure that the error thread exits
         errThread.join();
       } catch (InterruptedException ie) {
         LOG.warn("Interrupted while reading the error stream", ie);
       }
-      completed = true;
+      completed.set(true);
+      //the timeout thread handling
+      //taken care in finally block
       if (exitCode != 0) {
         throw new ExitCodeException(exitCode, errMsg.toString());
       }
     } catch (InterruptedException ie) {
       throw new IOException(ie.toString());
     } finally {
+      if ((timeOutTimer!=null) &amp;&amp; !timedOut.get()) {
+        timeOutTimer.cancel();
+      }
       // close the input stream
       try {
         inReader.close();
       } catch (IOException ioe) {
         LOG.warn("Error while closing the input stream", ioe);
       }
-      if (!completed) {
+      if (!completed.get()) {
         errThread.interrupt();
       }
       try {
@@ -264,21 +290,47 @@
     private String[] command;
     private StringBuffer output;
     
+    
     public ShellCommandExecutor(String[] execString) {
-      command = execString.clone();
+      this(execString, null);
     }
-
+    
     public ShellCommandExecutor(String[] execString, File dir) {
-      this(execString);
-      this.setWorkingDirectory(dir);
+      this(execString, dir, null);
     }
-
+   
     public ShellCommandExecutor(String[] execString, File dir, 
                                  Map&lt;String, String&gt; env) {
-      this(execString, dir);
-      this.setEnvironment(env);
+      this(execString, dir, env , 0L);
     }
-    
+
+    /**
+     * Create a new instance of the ShellCommandExecutor to execute a command.
+     * 
+     * @param execString The command to execute with arguments
+     * @param dir If not-null, specifies the directory which should be set
+     *            as the current working directory for the command.
+     *            If null, the current working directory is not modified.
+     * @param env If not-null, environment of the command will include the
+     *            key-value pairs specified in the map. If null, the current
+     *            environment is not modified.
+     * @param timeout Specifies the time in milliseconds, after which the
+     *                command will be killed and the status marked as timedout.
+     *                If 0, the command will not be timed out. 
+     */
+    public ShellCommandExecutor(String[] execString, File dir, 
+        Map&lt;String, String&gt; env, long timeout) {
+      command = execString.clone();
+      if (dir != null) {
+        setWorkingDirectory(dir);
+      }
+      if (env != null) {
+        setEnvironment(env);
+      }
+      timeOutInterval = timeout;
+    }
+        
+
     /** Execute the shell command. */
     public void execute() throws IOException {
       this.run();    
@@ -324,6 +376,24 @@
     }
   }
   
+  /**
+   * To check if the passed script to shell command executor timed out or
+   * not.
+   * 
+   * @return if the script timed out.
+   */
+  public boolean isTimedOut() {
+    return timedOut.get();
+  }
+  
+  /**
+   * Set if the command has timed out.
+   * 
+   */
+  private void setTimedOut() {
+    this.timedOut.set(true);
+  }
+  
   /** 
    * Static method to execute a shell command. 
    * Covers most of the simple cases without requiring the user to implement  
@@ -332,7 +402,7 @@
    * @return the output of the executed command.
    */
   public static String execCommand(String ... cmd) throws IOException {
-    return execCommand(null, cmd);
+    return execCommand(null, cmd, 0L);
   }
   
   /** 
@@ -341,15 +411,56 @@
    * the &lt;code&gt;Shell&lt;/code&gt; interface.
    * @param env the map of environment key=value
    * @param cmd shell command to execute.
+   * @param timeout time in milliseconds after which script should be marked timeout
+   * @return the output of the executed command.o
+   */
+  
+  public static String execCommand(Map&lt;String, String&gt; env, String[] cmd,
+      long timeout) throws IOException {
+    ShellCommandExecutor exec = new ShellCommandExecutor(cmd, null, env, 
+                                                          timeout);
+    exec.execute();
+    return exec.getOutput();
+  }
+
+  /** 
+   * Static method to execute a shell command. 
+   * Covers most of the simple cases without requiring the user to implement  
+   * the &lt;code&gt;Shell&lt;/code&gt; interface.
+   * @param env the map of environment key=value
+   * @param cmd shell command to execute.
    * @return the output of the executed command.
    */
   public static String execCommand(Map&lt;String,String&gt; env, String ... cmd) 
   throws IOException {
-    ShellCommandExecutor exec = new ShellCommandExecutor(cmd);
-    if (env != null) {
-      exec.setEnvironment(env);
+    return execCommand(env, cmd, 0L);
+  }
+  
+  /**
+   * Timer which is used to timeout scripts spawned off by shell.
+   */
+  private static class ShellTimeoutTimerTask extends TimerTask {
+
+    private Shell shell;
+
+    public ShellTimeoutTimerTask(Shell shell) {
+      this.shell = shell;
+    }
+
+    @Override
+    public void run() {
+      Process p = shell.getProcess();
+      try {
+        p.exitValue();
+      } catch (Exception e) {
+        //Process has not terminated.
+        //So check if it has completed 
+        //if not just destroy it.
+        if (p != null &amp;&amp; !shell.completed.get()) {
+          shell.setTimedOut();
+          p.destroy();
+        }
+      }
     }
-    exec.execute();
-    return exec.getOutput();
   }
 }

Modified: hadoop/common/trunk/src/test/core/org/apache/hadoop/util/TestShell.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/test/core/org/apache/hadoop/util/TestShell.java?rev=788600&amp;r1=788599&amp;r2=788600&amp;view=diff
==============================================================================
--- hadoop/common/trunk/src/test/core/org/apache/hadoop/util/TestShell.java (original)
+++ hadoop/common/trunk/src/test/core/org/apache/hadoop/util/TestShell.java Fri Jun 26 06:18:04
2009
@@ -20,7 +20,10 @@
 import junit.framework.TestCase;
 
 import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileOutputStream;
 import java.io.IOException;
+import java.io.PrintWriter;
 
 public class TestShell extends TestCase {
 
@@ -71,6 +74,27 @@
     assertInString(command, " .. ");
     assertInString(command, "\"arg 2\"");
   }
+  
+  public void testShellCommandTimeout() throws Throwable {
+    String rootDir = new File(System.getProperty(
+        "test.build.data", "/tmp")).getAbsolutePath();
+    File shellFile = new File(rootDir, "timeout.sh");
+    String timeoutCommand = "sleep 4; echo \"hello\"";
+    PrintWriter writer = new PrintWriter(new FileOutputStream(shellFile));
+    writer.println(timeoutCommand);
+    writer.close();
+    shellFile.setExecutable(true);
+    Shell.ShellCommandExecutor shexc 
+    = new Shell.ShellCommandExecutor(new String[]{shellFile.getAbsolutePath()},
+                                      null, null, 100);
+    try {
+      shexc.execute();
+    } catch (Exception e) {
+      //When timing out exception is thrown.
+    }
+    shellFile.delete();
+    assertTrue("Script didnt not timeout" , shexc.isTimedOut());
+  }
 
   private void testInterval(long interval) throws IOException {
     Command command = new Command(interval);




</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Trivial Update of &quot;Hambrug&quot; by edwardyoon</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626032335.18216.68849@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090626032335-18216-68849@eos-apache-org%3e</id>
<updated>2009-06-26T03:23:35Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by edwardyoon:
http://wiki.apache.org/hadoop/Hambrug

The comment on the change is:
To collaboration

New page:
== Hambrug ==
Let's discuss about the graph computing framework named Hambrug.

 * Edward J. (edwardyoon AT apache.org)
 * Hyunsik Choi (hyunsik.choi AT gmail.com)

== Related Project ==

 * [http://incubator.apache.org/hama Hama], A distributed matrix computational package for
Hadoop.
 * [http://rdf-proj.blogspot.com/ Heart], A large-scale RDF data store and a distributed processing
engine.


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Trivial Update of &quot;FrontPage&quot; by edwardyoon</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626030531.14090.78207@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090626030531-14090-78207@eos-apache-org%3e</id>
<updated>2009-06-26T03:05:31Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by edwardyoon:
http://wiki.apache.org/hadoop/FrontPage

The comment on the change is:
Fix some typos.

------------------------------------------------------------------------------
   * [http://www.alphaworks.ibm.com/tech/mapreducetools IBM MapReduce Tools for Eclipse] (An
Eclipse plug-in that simplifies the creation and deployment of MapReduce programs)
   * Hadoop IRC channel is #hadoop at irc.freenode.net.
   * [http://www.tom-doehler.de/wordpress/index.php/2007/12/19/spring-and-hadoop/ Using Spring
and Hadoop] (Discussion of possibilities to use Hadoop and Dependency Injection with Spring)
-  * [http://wiki.apache.org/hama Hama], a Parallel Matrix Computational Package based on
Hadoop Map/Reduce
+  * [http://wiki.apache.org/hama Hama], a Distributed Matrix Computational Package based
on Hadoop Map/Reduce
   * [http://heart.korea.ac.kr Heart], a Planet-Scale RDF Data Store and a Distributed Processing
Engine
   * [http://lucene.apache.org/mahout Mahout], scalable Machine Learning algorithms using
Hadoop
   * [http://opensolaris.org/os/project/livehadoop/ Live Hadoop] A three-node, distributed
Hadoop cluster running on an !OpenSolaris live CD


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;Hive/AdminManual/Plugins&quot; by LarryOgrodnek</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090626003327.2457.21312@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090626003327-2457-21312@eos-apache-org%3e</id>
<updated>2009-06-26T00:33:27Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by LarryOgrodnek:
http://wiki.apache.org/hadoop/Hive/AdminManual/Plugins

The comment on the change is:
changed 'register' to 'create'

------------------------------------------------------------------------------
  
  Once hive is started up with your jars in the classpath, the final step is to register your
function:
  {{{
- register temporary function my_lower as 'com.example.hive.udf.Lower';
+ create temporary function my_lower as 'com.example.hive.udf.Lower';
  }}}
  Now you can start using it:
  {{{


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;HowToContribute&quot; by TomWhite</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090625211627.9271.14596@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090625211627-9271-14596@eos-apache-org%3e</id>
<updated>2009-06-25T21:16:27Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by TomWhite:
http://wiki.apache.org/hadoop/HowToContribute

The comment on the change is:
Added links to review queue Jira filters

------------------------------------------------------------------------------
  
  Finally, patches should be ''attached'' to an issue report in [http://issues.apache.org/jira/browse/HADOOP
Jira] via the '''Attach File''' link on the issue's Jira. Please add a comment that asks for
a code review following our [:CodeReviewChecklist: code review checklist]. Please note that
the attachment should be granted license to ASF for inclusion in ASF works (as per the [http://www.apache.org/licenses/LICENSE-2.0
Apache License] §5). 
  
- When you believe that your patch is ready to be committed, select the '''Submit Patch'''
link on the issue's Jira.  Submitted patches will be automatically tested against "trunk"
by [http://hudson.zones.apache.org/hudson/ Hudson], the project's continuous integration engine.
 Upon test completion, Hudson will add a success ("+1") message or failure ("-1") to your
issue report in Jira.  If your issue contains multiple patch versions, Hudson tests the last
patch uploaded.
+ When you believe that your patch is ready to be committed, select the '''Submit Patch'''
link on the issue's Jira.  Submitted patches will be automatically tested against "trunk"
by [http://hudson.zones.apache.org/hudson/view/Hadoop/ Hudson], the project's continuous integration
engine.  Upon test completion, Hudson will add a success ("+1") message or failure ("-1")
to your issue report in Jira.  If your issue contains multiple patch versions, Hudson tests
the last patch uploaded.
  
  Folks should run {{{ant clean test javadoc checkstyle}}} before selecting '''Submit Patch'''.
 Tests should all pass.  Javadoc should report '''no''' warnings or errors. Checkstyle's error
count should not exceed that listed at [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/lastSuccessfulBuild/artifact/trunk/build/test/checkstyle-errors.html
Checkstyle Errors]  Hudson's tests are meant to double-check things, and not be used as a
primary patch tester, which would create too much noise on the mailing list and in Jira. 
Submitting patches that fail Hudson testing is frowned on, (unless the failure is not actually
due to the patch).
  
@@ -179, +179 @@

  
  Should your patch receive a "-1" from the Hudson testing, select the '''Resume Progress'''
on the issue's Jira, upload a new patch with necessary fixes, and then select the '''Submit
Patch''' link again.
  
- Committers: for non-trivial changes, it is best to get another committer to review your
patches before commit.  Use '''Submit Patch''' link like other contributors, and then wait
for a "+1" from another committer before committing.  Please also try to frequently review
things in the patch queue.
+ Committers: for non-trivial changes, it is best to get another committer to review your
patches before commit.  Use '''Submit Patch''' link like other contributors, and then wait
for a "+1" from another committer before committing.  Please also try to frequently review
things in the patch queues:
+  * [https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&amp;requestId=12311124
Hadoop Common Review Queue]
+  * [https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&amp;requestId=12313301
Hadoop HDFS Review Queue]
+  * [https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&amp;requestId=12313302
Hadoop MapReduce Review Queue]
  
  == Jira Guidelines ==
  


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;FAQ&quot; by KonstantinShvachko</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090625003924.11977.40920@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090625003924-11977-40920@eos-apache-org%3e</id>
<updated>2009-06-25T00:39:24Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by KonstantinShvachko:
http://wiki.apache.org/hadoop/FAQ

------------------------------------------------------------------------------
   1. Select a subset of files that take up a good percentage of your disk space; copy them
to new locations in HDFS; remove the old copies of the files; rename the new copies to their
original names.
   2. A simpler way, with no interruption of service, is to turn up the replication of files,
wait for transfers to stabilize, and then turn the replication back down.
   3. Yet another way to re-balance blocks is to turn off the data-node, which is full, wait
until its blocks are replicated, and then bring it back again. The over-replicated blocks
will be randomly removed from different nodes, so you really get them rebalanced not just
removed from the current node.
-  4. Finally, you can use the bin/start-balancer.sh command to run a balancing process to
move blocks around the cluster automatically.
+  4. Finally, you can use the bin/start-balancer.sh command to run a balancing process to
move blocks around the cluster automatically. See 
+   * [http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Rebalancer  HDFS User
Guide: Rebalancer];
+   * [http://developer.yahoo.com/hadoop/tutorial/module2.html#rebalancing HDFS Tutorial:
Rebalancing];
+   * [http://hadoop.apache.org/core/docs/current/commands_manual.html#balancer HDFS Commands
Guide: balancer].
  
  [[BR]]
  [[Anchor(7)]]


</pre>
</div>
</content>
</entry>
<entry>
<title>svn commit: r788181 - in /hadoop/common/branches/branch-0.20: CHANGES.txt src/docs/src/documentation/content/xdocs/hdfs_quota_admin_guide.xml src/hdfs/org/apache/hadoop/hdfs/tools/DFSAdmin.java</title>
<author><name>rangadi@apache.org</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624205713.E2FAF23888A0@eris.apache.org%3e"/>
<id>urn:uuid:%3c20090624205713-E2FAF23888A0@eris-apache-org%3e</id>
<updated>2009-06-24T20:57:13Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Author: rangadi
Date: Wed Jun 24 20:57:13 2009
New Revision: 788181

URL: http://svn.apache.org/viewvc?rev=788181&amp;view=rev
Log:
HDFS-438. Improve help message for space quota command. (Raghu Angadi)

Modified:
    hadoop/common/branches/branch-0.20/CHANGES.txt
    hadoop/common/branches/branch-0.20/src/docs/src/documentation/content/xdocs/hdfs_quota_admin_guide.xml
    hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/tools/DFSAdmin.java

Modified: hadoop/common/branches/branch-0.20/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/CHANGES.txt?rev=788181&amp;r1=788180&amp;r2=788181&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/CHANGES.txt (original)
+++ hadoop/common/branches/branch-0.20/CHANGES.txt Wed Jun 24 20:57:13 2009
@@ -26,6 +26,8 @@
     HADOOP-4372. Improves the way history filenames are obtained and manipulated.
     (Amar Kamat via ddas)
 
+    HDFS-438. Improve help message for space quota command. (Raghu Angadi)
+
   OPTIMIZATIONS
 
   BUG FIXES

Modified: hadoop/common/branches/branch-0.20/src/docs/src/documentation/content/xdocs/hdfs_quota_admin_guide.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/docs/src/documentation/content/xdocs/hdfs_quota_admin_guide.xml?rev=788181&amp;r1=788180&amp;r2=788181&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/src/docs/src/documentation/content/xdocs/hdfs_quota_admin_guide.xml
(original)
+++ hadoop/common/branches/branch-0.20/src/docs/src/documentation/content/xdocs/hdfs_quota_admin_guide.xml
Wed Jun 24 20:57:13 2009
@@ -68,7 +68,8 @@
 directory has no quota. &lt;/li&gt;
 
  &lt;li&gt; &lt;code&gt;dfsadmin -setSpaceQuota &amp;lt;N&gt; &amp;lt;directory&gt;...&amp;lt;directory&gt;&lt;/code&gt;
&lt;br /&gt; Set the space quota to be
-N bytes for each directory. N can also be specified with a binary prefix for convenience,
for e.g. 50g for 50 gigabytes and 
+N bytes for each directory. This is a hard limit on total size of all the files under the
directory tree.
+The space quota takes replication also into account, i.e. one GB of data with replication
of 3 consumes 3GB of quota. N can also be specified with a binary prefix for convenience,
for e.g. 50g for 50 gigabytes and 
 2t for 2 terabytes etc. Best effort for each directory, with faults reported if &lt;code&gt;N&lt;/code&gt;
is
 neither zero nor a positive integer, the directory does not exist or it is a file, or the
directory would immediately exceed
 the new quota. &lt;/li&gt;

Modified: hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/tools/DFSAdmin.java
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/tools/DFSAdmin.java?rev=788181&amp;r1=788180&amp;r2=788181&amp;view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/tools/DFSAdmin.java
(original)
+++ hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/tools/DFSAdmin.java
Wed Jun 24 20:57:13 2009
@@ -196,8 +196,10 @@
       "-"+NAME+" &lt;quota&gt; &lt;dirname&gt;...&lt;dirname&gt;";
     private static final String DESCRIPTION = USAGE + ": " +
       "Set the disk space quota &lt;quota&gt; for each directory &lt;dirName&gt;.\n" + 
-      "\t\tThe directory quota is a long integer that puts a hard limit\n" +
-      "\t\ton the number of names in the directory tree.\n" +
+      "\t\tThe space quota is a long integer that puts a hard limit\n" +
+      "\t\ton the total size of all the files under the directory tree.\n" +
+      "\t\tThe extra space required for replication is also counted. E.g.\n" +
+      "\t\ta 1GB file with replication of 3 consumes 3GB of the quota.\n\n" +
       "\t\tQuota can also be speciefied with a binary prefix for terabytes,\n" +
       "\t\tpetabytes etc (e.g. 50t is 50TB, 5m is 5MB, 3p is 3PB).\n" + 
       "\t\tBest effort for the directory, with faults reported if\n" +
@@ -492,13 +494,13 @@
       System.out.println(upgradeProgress);
     } else if ("metasave".equals(cmd)) {
       System.out.println(metaSave);
-    } else if (SetQuotaCommand.matches(cmd)) {
+    } else if (SetQuotaCommand.matches("-"+cmd)) {
       System.out.println(SetQuotaCommand.DESCRIPTION);
-    } else if (ClearQuotaCommand.matches(cmd)) {
+    } else if (ClearQuotaCommand.matches("-"+cmd)) {
       System.out.println(ClearQuotaCommand.DESCRIPTION);
-    } else if (SetSpaceQuotaCommand.matches(cmd)) {
+    } else if (SetSpaceQuotaCommand.matches("-"+cmd)) {
       System.out.println(SetSpaceQuotaCommand.DESCRIPTION);
-    } else if (ClearSpaceQuotaCommand.matches(cmd)) {
+    } else if (ClearSpaceQuotaCommand.matches("-"+cmd)) {
       System.out.println(ClearSpaceQuotaCommand.DESCRIPTION);
     } else if ("refreshServiceAcl".equals(cmd)) {
       System.out.println(refreshServiceAcl);




</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;HowToContribute&quot; by GaryMurry</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624172326.25229.78748@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624172326-25229-78748@eos-apache-org%3e</id>
<updated>2009-06-24T17:23:26Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by GaryMurry:
http://wiki.apache.org/hadoop/HowToContribute

The comment on the change is:
Reverted one URL change. It links to a static HTML page.

------------------------------------------------------------------------------
  
  First of all, you need the Hadoop source code.[[BR]]
  
- Get the source code on your local drive using [http://hadoop.apache.org/common/version_control.html
SVN].  Most development is done on the "trunk":
+ Get the source code on your local drive using [http://hadoop.apache.org/core/version_control.html
SVN].  Most development is done on the "trunk":
  
  {{{
  svn checkout http://svn.apache.org/repos/asf/hadoop/common/trunk/ hadoop-common-trunk


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;SupportingProjects&quot; by FredrikMollerstrand</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624172044.24481.83930@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624172044-24481-83930@eos-apache-org%3e</id>
<updated>2009-06-24T17:20:44Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by FredrikMollerstrand:
http://wiki.apache.org/hadoop/SupportingProjects

The comment on the change is:
added zohmg

------------------------------------------------------------------------------
   * [http://belowdeck.kissintelligentsystems.com/ohm OHM] -- is a weakly relational ORM for
HBase which provides Object Mapping and Column indexing. It has its own compiler capable of
generating interface code for multiple languages. Currently C# (via the Thrift API), with
 support Java currently in development. The compiler is easily extensible to add support for
other languages.
   * [http://datastore.googlecode.com datastore] -- Aims to be an implementation of the [http://code.google.com/appengine/docs/python/datastore/
google app-engine datastore] in Java using Hbase instead of bigtable
   * [http://datanucleus.org DataNucleus] -- is a Java JDO/JPA/REST implementation. It supports
HBase, and many other datastores.
+  * [http://github.com/zohmg/zohmg/tree/master Zohmg] -- Time series data store that uses
HBase as its backing store.
  


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Trivial Update of &quot;Hbase&quot; by stack</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624171004.20993.73547@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624171004-20993-73547@eos-apache-org%3e</id>
<updated>2009-06-24T17:10:04Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase

The comment on the change is:
Moved obsoleted pages to stale section

------------------------------------------------------------------------------
   * Support:
    * [http://hadoop.apache.org/hbase/mailing_lists.html HBase Mailing Lists] See also the
[http://hadoop.apache.org/core/mailing_lists.html Hadoop Mailing Lists]
    * [http://hadoop.apache.org/hbase/irc.html HBase IRC Channel]
-  * HBase [http://hadoop.apache.org/hbase/releases.html#News news], [:HBase/HBasePresentations:
presentations], [:HBase/Articles: articles], and the [http://jdcryans.blogspot.com/ Unofficial
Blog]
+  * HBase [http://hadoop.apache.org/hbase/releases.html#News news], [:HBase/HBasePresentations:
presentations], [:HBase/Articles: articles], and [https://twitter.com/HBase twitter].
   * [:Hbase/PoweredBy: PoweredBy], a list of sites and applications powered by HBase
   * SupportingProjects
  
  == Administrators / Setup Guides and config ==
   * [http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#overview_description
Getting Started]
-  * [wiki:Hbase/10Minutes How to download and run hbase in about 10 Minutes].
   * HBase and Performance
    * [wiki:PerformanceTuning Performance Tuning]
    * [:Hbase/PerformanceEvaluation: Tools for evaluating HBase performance and scalability]
     * There are setup instructions and a JMeter Test Plan in [https://issues.apache.org/jira/browse/HADOOP-2625
HADOOP-2625]
-   * [:Hbase/HbaseRTDS: A performance evaluation of HBase]
   * [wiki:UsingLzoCompression Using LZO Compression]
+  * [wiki:Hbase/RollingRestart Rolling Restart] of HBase
   * Migrating between HBase versions
    * [:Hbase/HowToMigrate: Migration from one version of HBase to another]
    * [:Hbase/Plan-0.2/APIChanges: API changes]
@@ -43, +42 @@

  == User Developer Documentation ==
   * [http://hadoop.apache.org/hbase/docs/current/ HBase API Docs]
   * [:Hbase/Shell: HBase Shell] -- Based on Ruby's IRB
-  * [:Hbase/JRuby: JRuby interface to HBase] -- obsoleted by the new (J)IRB shell
+   * [:Hbase/JRuby: JRuby interface to HBase] -- obsoleted by the new (J)IRB shell
   * HBase non-java access
    * [:Hbase/Jython: Jython interface to HBase]
    * [:Hbase/Groovy: Groovy DSL for HBase]
    * [:Hbase/HbaseRest: REST gateway specification for HBase]
    * [:Hbase/ThriftApi: Thrift gateway specification for HBase]
    * [http://github.com/sishen/hbase-ruby Ruby client for Hbase's REST API]
-  * [:Hbase/MapReduce: Using HBase with Hadoop MapReduce]
+  * [:Hbase/MapReduce: Using HBase with Hadoop MapReduce] -- Obsoleted by [http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapred/package-summary.html
HBase MapReduce Package Summary]
   * [:Hbase/Cascading: Using HBase with Cascading]
  
  == Developer Documentation ==
-  * [:HBase/RoadMaps: Roadmaps]
+  * [:HBase/RoadMaps: Roadmaps] -- TODO: Update!
   * [:Hbase/HowToContribute: How to contribute]
   * [:Hbase/EclipseEnvironment: How to build HBase under Eclipse]
   * [:Hbase/HowToTest: How to test HBase]
@@ -74, +73 @@

   * [:Hbase/NewFileFormat: Discussion of new file format] -- hfile has become the new store
file format in hbase
   * [:Hbase/ZookeeperIntegration: HBase/Zookeeper integration documentation] -- integrated
   * [wiki:Hbase/UsingBloomFilters Using Bloom Filters] -- removed in 0.20.x, to be reinstated
in 0.21.x
+  * [http://jdcryans.blogspot.com/ Unofficial Blog]
+  * [wiki:Hbase/10Minutes How to download and run hbase in about 10 Minutes].
+  * [:Hbase/HbaseRTDS: A performance evaluation of HBase]
   
  


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;HowToContribute&quot; by GaryMurry</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624165929.17486.99533@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624165929-17486-99533@eos-apache-org%3e</id>
<updated>2009-06-24T16:59:29Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by GaryMurry:
http://wiki.apache.org/hadoop/HowToContribute

The comment on the change is:
Just changing URL references from core to common. No change to mailing list name

------------------------------------------------------------------------------
- = How to Contribute to Hadoop Core =
+ = How to Contribute to Hadoop Common =
  
- This page describes the mechanics of ''how'' to contribute software to Hadoop Core.  For
ideas about ''what'' you might contribute, please see the ProjectSuggestions page.
+ This page describes the mechanics of ''how'' to contribute software to Hadoop Common.  For
ideas about ''what'' you might contribute, please see the ProjectSuggestions page.
  
  === Getting the source code ===
  
  First of all, you need the Hadoop source code.[[BR]]
  
- Get the source code on your local drive using [http://hadoop.apache.org/core/version_control.html
SVN].  Most development is done on the "trunk":
+ Get the source code on your local drive using [http://hadoop.apache.org/common/version_control.html
SVN].  Most development is done on the "trunk":
  
  {{{
- svn checkout http://svn.apache.org/repos/asf/hadoop/core/trunk/ hadoop-core-trunk
+ svn checkout http://svn.apache.org/repos/asf/hadoop/common/trunk/ hadoop-common-trunk
  }}}
  
- You may also want to develop against a specific release.  To do so, visit [http://svn.apache.org/repos/asf/hadoop/core/tags/]
and find the release that you are interested in developing against.  To checkout this release,
run:
+ You may also want to develop against a specific release.  To do so, visit [http://svn.apache.org/repos/asf/hadoop/common/tags/]
and find the release that you are interested in developing against.  To checkout this release,
run:
  
  {{{
- svn checkout http://svn.apache.org/repos/asf/hadoop/core/tags/release-X.Y.Z/ hadoop-core-X.Y.Z
+ svn checkout http://svn.apache.org/repos/asf/hadoop/common/tags/release-X.Y.Z/ hadoop-common-X.Y.Z
  }}}
  
  If you prefer to use Eclipse for development, there are instructions for setting up SVN
access from within Eclipse at EclipseEnvironment.
@@ -57, +57 @@

  Please make sure that all unit tests succeed before constructing your patch and that no
new javac compiler warnings are introduced by your patch.
  
  {{{
- &gt; cd hadoop-core-trunk
+ &gt; cd hadoop-common-trunk
  &gt; ant -Djavac.args="-Xlint -Xmaxwarns 1000" clean test tar
  }}}
  After a while, if you see


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;PerformanceTuning&quot; by SteveLoughran</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624161631.2369.48597@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624161631-2369-48597@eos-apache-org%3e</id>
<updated>2009-06-24T16:16:31Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/PerformanceTuning

The comment on the change is:
Broaden to more than just HBase, mention the jvm reuse option

------------------------------------------------------------------------------
+ == NameNode Performance Tips ==
+ 
+  * Lots of RAM; you don't want the Namenode JVM to be swapping.
+ 
+ 
+ 
+ == MapReduce Performance ==
+ 
+ You can save a lot of time by enabling JVM re-use on MR jobs. In the JobTracker, or the
Job itself, set {{{mapred.job.reuse.jvm.num.tasks}}} to the number of times to reuse a JVM
''for the same map or reduce transform''  -or to -1 to reuse without limits. This reduces
JVM startup/teardown times. 
+ 
+ The more copies of a block there is, the more places there are to schedule work on the same
host as the block, so eliminating the need to copy the block over the network. Set the {{block.replication.factor}}
on files to be more than the default (usually 3) if you want to make it accessible in more
spaces. 
+ 
- == Performance tips ==
+ == HBase Performance tips ==
  
   * Use compression, see [UsingLzoCompression]
   * Ram, ram, ram.  Don't starve HBase.
   * More CPUs is important, as you will see in the next section
-  * Use a 64 bit platform, and a 64 bit JVM.
+  * Use a 64-bit platform, and a 64-bit JVM.
   * Your clients might need tuning: [http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html]
-  * Make sure that java implies -server on your machines, or else you will have to explicitly
enable it.
+  * Make sure that the command {{{java}}} implies {{{-server}}} on your machines, or else
you will have to explicitly enable it.
  
- == JVM and GC ==
+ == HBase JVM and GC ==
  
  HBase is memory intensive, and using the default GC you can see long pauses in all threads.
 With the addition of ZooKeeper this can cause false errors as ZooKeeper and the HBase master
thinks a regionserver has died.  
  
@@ -78, +90 @@

  export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m &lt;cms options from above&gt; &lt;gc
logging options from above&gt;"
  }}}
  
- 
- 


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Trivial Update of &quot;FrontPage&quot; by SteveLoughran</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624160733.29767.37270@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624160733-29767-37270@eos-apache-org%3e</id>
<updated>2009-06-24T16:07:33Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/FrontPage

The comment on the change is:
we already have a page

------------------------------------------------------------------------------
    * [:LargeClusterTips: Tips for managing a large cluster]
    * [:VirtualCluster: How to bring up a cluster of Virtual Machines]
    * [:DiskSetup: Disk Setup: some suggestions]
-   * [:Performance: Performance:] getting extra throughput
+   * [:PerformanceTuning: Performance:] getting extra throughput
    * [http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html Hadoop Windows/Eclipse Tutorial
]: Tutorial on how to setup and configure Hadoop development cluster for Windows and Eclipse.
  
   * Map/Reduce


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Trivial Update of &quot;FrontPage&quot; by SteveLoughran</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624154359.23160.17628@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624154359-23160-17628@eos-apache-org%3e</id>
<updated>2009-06-24T15:43:59Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/FrontPage

The comment on the change is:
fixing my links

------------------------------------------------------------------------------
    * [:LargeClusterTips: Tips for managing a large cluster]
    * [:VirtualCluster: How to bring up a cluster of Virtual Machines]
    * [:DiskSetup: Disk Setup: some suggestions]
-   * [wiki:Performance] -getting extra throughput
+   * [:Performance: Performance:] getting extra throughput
    * [http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html Hadoop Windows/Eclipse Tutorial
]: Tutorial on how to setup and configure Hadoop development cluster for Windows and Eclipse.
  
   * Map/Reduce


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Trivial Update of &quot;FrontPage&quot; by SteveLoughran</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624121851.1718.51485@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624121851-1718-51485@eos-apache-org%3e</id>
<updated>2009-06-24T12:18:51Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/FrontPage

The comment on the change is:
link correction

------------------------------------------------------------------------------
    * [:LargeClusterTips: Tips for managing a large cluster]
    * [:VirtualCluster: How to bring up a cluster of Virtual Machines]
    * [:DiskSetup: Disk Setup: some suggestions]
-   * [Performance] -getting extra throughput
+   * [wiki:Performance] -getting extra throughput
    * [http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html Hadoop Windows/Eclipse Tutorial
]: Tutorial on how to setup and configure Hadoop development cluster for Windows and Eclipse.
  
   * Map/Reduce


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;FrontPage&quot; by SteveLoughran</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624121721.1290.31399@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624121721-1290-31399@eos-apache-org%3e</id>
<updated>2009-06-24T12:17:21Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/FrontPage

The comment on the change is:
new performance page

------------------------------------------------------------------------------
    * [:LargeClusterTips: Tips for managing a large cluster]
    * [:VirtualCluster: How to bring up a cluster of Virtual Machines]
    * [:DiskSetup: Disk Setup: some suggestions]
+   * [Performance] -getting extra throughput
    * [http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html Hadoop Windows/Eclipse Tutorial
]: Tutorial on how to setup and configure Hadoop development cluster for Windows and Eclipse.
  
   * Map/Reduce


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Trivial Update of &quot;NameNode&quot; by SteveLoughran</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624101921.13046.15196@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624101921-13046-15196@eos-apache-org%3e</id>
<updated>2009-06-24T10:19:21Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/NameNode

The comment on the change is:
new page

------------------------------------------------------------------------------
  
  Client applications talk to the NameNode whenever they wish to locate a file, or when they
want to add/copy/move/delete a file. The NameNode responds the successful requests by returning
a list of relevant DataNode servers where the data lives.
  
- The NameNode is a Single Point of Failure for the HDFS Cluster.  HDFS is not currently a
High Availability system. When the NameNode goes down, the file system goes offline.  There
is an optional SecondaryNameNode that can be hosted on a separate machine.  It only creates
checkpoints of the namespace by merging the edits file into the fsimage file and does not
provide any real redundancy.
+ The NameNode is a [Single Point of Failure] for the HDFS Cluster.  HDFS is not currently
a High Availability system. When the NameNode goes down, the file system goes offline.  There
is an optional SecondaryNameNode that can be hosted on a separate machine.  It only creates
checkpoints of the namespace by merging the edits file into the fsimage file and does not
provide any real redundancy.
  
  It is essential to look after the NameNode. Here are some recommendations from production
use
  


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Trivial Update of &quot;LargeClusterTips&quot; by SteveLoughran</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624101816.12646.86538@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624101816-12646-86538@eos-apache-org%3e</id>
<updated>2009-06-24T10:18:16Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/LargeClusterTips

The comment on the change is:
formatting

------------------------------------------------------------------------------
  
  Things will go wrong. There is always SPOF. Test your failure handling processes before
you go live. 
  
- * Simulate a corrupted edit log by killing the namenode process, truncating the (binary)
edit log, and bringing it up. See how the team handles it. 
+  * Simulate a corrupted edit log by killing the namenode process, truncating the (binary)
edit log, and bringing it up. See how the team handles it. 
- * Turn off one of the switches, pull out a network cable. See how the cluster handles it,
how it recovers. Then put the switch back on.
+  * Turn off one of the switches, pull out a network cable. See how the cluster handles it,
how it recovers. Then put the switch back on.
- * Turn an entire rack off without warning. See what happens when they go offline.
+  * Turn an entire rack off without warning. See what happens when they go offline.
- * Turn off DNS. 
+  * Turn off DNS. Or just the rDNS side of things.
- * Turn off the entire datacenter, switch it back on. Are there any race conditions?
+  * Turn off the entire datacenter, switch it back on. Are there any race conditions?
- * Write an job that tries to generate too much data, fills up the cluster. How is it handled?
+  * Write an job that tries to generate too much data, fills up the cluster. How is it handled?
  


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;VirtualCluster&quot; by SteveLoughran</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624101630.12099.17088@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624101630-12099-17088@eos-apache-org%3e</id>
<updated>2009-06-24T10:16:30Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/VirtualCluster

The comment on the change is:
more troublespots. 

------------------------------------------------------------------------------
   i. All machine's(both VM's and physical machines) public key are distributed to all "~/.ssh/authorized_keys"
file.
   i. conf/hadoop-site.xml file is similar for all the machines.
   i. /etc/hosts file must contain all the machines(VM,Physical machine) IP and Hostname.
-  i. The local hostname entry in /etc/hosts must not point to 127.0.0.1 or any other loopback
address (some laptop-friendly Unix distributions do this). It should be to the assigned IP
address.
+  i. The local hostname entry in /etc/hosts must not point to 127.0.0.1 or any other loopback
address (some laptop-friendly Linux distributions do this). It should be to the assigned IP
address.
   i. conf/slaves must contain the hostname of all slaves including VM's and physical machine.
   i. conf/masters must contain only master's hostname.
   i. both conf/masters and conf/slaves files must be similar in all the participating machines.
@@ -28, +28 @@

  Here are things that can cause trouble.
   1. Multiple virtual network adapters. It is simpler with one network adapter/node
   1. Machines changing hostname/IPAddress on a reboot. For a long-lived virtual cluster you
need stable machine names.
+  1. Machines whose hostname doesn't match the hostname the network assigns it. It thinks
it is "granton", the network thinks it is "dhcp-169-45", that being the name everything else
talks to it by.
+  1. Machines that think they have the same hostname. You get this if you clone VMs and don't
rename them.
   1. Pauses of an entire VM for 5-10s or longer. This happens when the virtual host is overloaded
and your VM has been swapped out. Host less VMs, or have them ask for less memory.
+  1. Wierd clock drift where it can even run backwards. Again, don't overload your machines.
+  1. All redundant virtual servers (e.g. Namenode and secondary NN) being hosted on the same
physical machine. At that point, you don't have redundancy or failover any more.
  


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;LargeClusterTips&quot; by SteveLoughran</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624101303.11115.89739@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624101303-11115-89739@eos-apache-org%3e</id>
<updated>2009-06-24T10:13:03Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/LargeClusterTips

The comment on the change is:
more ideas, including things to test before you go live

------------------------------------------------------------------------------
  
   * Have a good sysadmin if you're not one yourself.
   * Take a look at a presentation done by Allen Wittenauer from Yahoo!: http://tinyurl.com/5foamm
-  * Have the LAN closed off to untrusted users. This simplifies security.
+  * Have the LAN closed off to untrusted users. Without this, your filesystem is effectively
open to everyone on the network.
+  * Once you are on the private LAN, turn off all firewalls on the machines, as it only creates
connectivity problems.
   * Use LDAP or similar to manage user accounts.
   * Only put the slaves file on your namenode and secondary namenode to prevent confusion.
+ 
-  * Have identical hardware on all machines in the cluster, eliminating the need to have
different
-    configuration options (task slots, data directory locations, etc)
   * Use RPMs to install the Hadoop binaries. Self:Cloudera provide some RPMs for this, and
a web site to generate configuration RPM files.
   * Use kickstart or similar to bring up the machines. 
-  * Consider a system configuration management package to keep Hadoop's source and configuration
consistent across all nodes.  Some example packages are bcfg2, smartfrog, puppet, cfengine,
etc. 
   * If you are trying to configure the machines one by one, step away from the keyboard.
That is not the way to manage a cluster.
+  * Keep an eye out for disk SMART messages in the server logs. They warn of trouble.
+  * Keep an eye on disk capacity, especially on the namenode. You do not want the NN to run
out of storage as Bad Things happen.
+  * Keep the underlying software in sync: OS, Java version. 
+  * Run the rebalancer, throttled back appropriately for your bandwidth
+ 
  
  See the Self:AmazonEC2 and AmazonS3 pages for tips on managing clusters built on EC2 and
S3.
  
  Other good documentation: [http://wiki.smartfrog.org/wiki/display/sf/Patterns+of+Hadoop+Deployment
Patterns of Hadoop Deployment]
  
+ == Hadoop Configuration ==
+ 
+  * Don't do it by copying XML files around by hand.
+  * Look at the cloudera config tools. If you use them, keep the previous RPMs around
+  * Consider a system configuration management package to keep Hadoop's source and configuration
consistent across all nodes.  Some example packages are bcfg2, SmartFrog, Puppet, cfengine,
etc. 
+  * Keep your site XML files under SCM, so you can roll-back, diff changes. 
+ 
+ 
+ == NameNode Health ==
+ 
+ The NameNode is a SPOF. When it goes offline, the cluster goes down. If it loses its data,
the filesystem is gone. Value it.
+  * Have a secondary name node! When the BackupNode replaces this, have a BackupNode!
+  * Never let its disks fill up.
+  * Consider RAID storage here. If not, set it to save its data to two independent drives,
ideally on separate controllers (just in case the controller decides to play up)
+  * Set the NN up to save one copy of all its data to a remote machine (NFS?), so even if
the NN goes down, you can bring up a new machine with the same hostname for everything else
to bind to.
+ 
+ 
+ == Workers ==
+ 
+  * Have identical hardware on all workers in the cluster, eliminating the need to have different
configuration options (task slots, data directory locations, etc.)
+  * Have a common user account on every machine you run Hadoop on, with a common public key
in ~/.ssh/authorized_keys
+  * Track HDDs, their history and their failures. Disk failures are not always independent
in a large datacentre.
+  * Have simple hostname to rack or IP to rack mappings, so the rack detection scripts are
trivial.
+ 
+ === How to rebalance a full datanode ===
+ 
+ If a datanode is at or near 100% capacity, 
+  1. Decommission the node: this will copy everything off. 
+  2. Take it offline.
+  3. Delete the data, clean up the HDDs.
+  4. Add the node again. 
+ 
+ == Testing Failure ==
+ 
+ Things will go wrong. There is always SPOF. Test your failure handling processes before
you go live. 
+ 
+ * Simulate a corrupted edit log by killing the namenode process, truncating the (binary)
edit log, and bringing it up. See how the team handles it. 
+ * Turn off one of the switches, pull out a network cable. See how the cluster handles it,
how it recovers. Then put the switch back on.
+ * Turn an entire rack off without warning. See what happens when they go offline.
+ * Turn off DNS. 
+ * Turn off the entire datacenter, switch it back on. Are there any race conditions?
+ * Write an job that tries to generate too much data, fills up the cluster. How is it handled?
+ 


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Trivial Update of &quot;FrontPage&quot; by SteveLoughran</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624095238.4624.98130@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624095238-4624-98130@eos-apache-org%3e</id>
<updated>2009-06-24T09:52:38Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/FrontPage

The comment on the change is:
add some text to the x-ref

------------------------------------------------------------------------------
   * QuickStart (for those who just want it to work ''now'')
   * [http://hadoop.apache.org/core/docs/current/commands_manual.html Command Line Options]
for hadoop shell script.
   * [:HadoopOverview: Hadoop Code Overview]
-  * [:TroubleShooting:] What do when things go wrong
+  * [:TroubleShooting: Troubleshooting] What do when things go wrong
  
   * Cluster setup
    * ["Running Hadoop On Ubuntu Linux (Single-Node Cluster)"] (tutorial on installing, configuring
and running Hadoop on a single machine)


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;ZooKeeper&quot; by BenjaminReed</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624073424.9028.13259@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624073424-9028-13259@eos-apache-org%3e</id>
<updated>2009-06-24T07:34:24Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by BenjaminReed:
http://wiki.apache.org/hadoop/ZooKeeper

------------------------------------------------------------------------------
   * [:ZooKeeper/HowToRelease: HowToRelease]
   * [:ZooKeeper/ProjectSuggestions: ProjectSuggestions]
  
+ == Related Projects ==
+ 
+  * BookKeeper
+ 


</pre>
</div>
</content>
</entry>
<entry>
<title>[Hadoop Wiki] Update of &quot;BookKeeper&quot; by BenjaminReed</title>
<author><name>Apache Wiki &lt;wikidiffs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624073312.8515.90878@eos.apache.org%3e"/>
<id>urn:uuid:%3c20090624073312-8515-90878@eos-apache-org%3e</id>
<updated>2009-06-24T07:33:12Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by BenjaminReed:
http://wiki.apache.org/hadoop/BookKeeper

New page:
= BookKeeper =

BookKeeper is a system to reliably log streams of records. It is designed to store write ahead
logs, such as those found in database or database like applications. In fact, the Hadoop NameNode
inspired BookKeeper. The NameNode logs changes to the in-memory namespace data structures
to the local disk before they are applied in memory. However logging the changes locally means
that if the NameNode fails the log will be inaccessible. We found that by using BookKeeper,
the NameNode can log to distributed storage devices in a way that yields higher availability
and performance. Although it was designed for the NameNode, BookKeeper can be used for any
application that needs strong durability guarantees with high performance and has a single
writer.

In BookKeeper, servers are "bookies", log streams are "ledgers", and each unit of a log (aka
record) is a "ledger entry". BookKeeper is designed to be reliable; bookies, the servers that
store ledgers can be byzantine, which means that some subset of the bookies can fail, corrupt
data, discard data, but as long as there are enough correctly behaving servers the service
as a whole behaves correctly; the meta data for BookKeeper is stored in ZooKeeper.

BookKeeper achieves high availability and strong durability guarantees by replicating ledger
entries across multiple bookies. The ledgers themselves are striped across the bookies for
high performance.

The BookKeeper data model is a flat namespace of ledgers identified by a long. The ledgers
are append only and writable by a single client. The basic operations of BookKeeper are: createLedger
to create a new ledger available for writing, openLedger to read from an existing ledger,
addEntry, removeEntry, and closeLedger. Once a ledger is closed it becomes read-only.

''once the 3.2 release happens we will include a link to the documentation here''. 


</pre>
</div>
</content>
</entry>
<entry>
<title>svn commit: r787913 [3/4] - in /hadoop/common/trunk: ./ src/java/org/apache/hadoop/io/file/ src/java/org/apache/hadoop/io/file/tfile/ src/test/ src/test/core/org/apache/hadoop/io/file/ src/test/core/org/apache/hadoop/io/file/tfile/</title>
<author><name>cdouglas@apache.org</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624054828.90D1323888D8@eris.apache.org%3e"/>
<id>urn:uuid:%3c20090624054828-90D1323888D8@eris-apache-org%3e</id>
<updated>2009-06-24T05:48:26Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Utils.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Utils.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Utils.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Utils.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,516 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+import java.util.Comparator;
+import java.util.List;
+
+import org.apache.hadoop.io.Text;
+
+/**
+ * Supporting Utility classes used by TFile, and shared by users of TFile.
+ */
+public final class Utils {
+
+  /**
+   * Prevent the instantiation of Utils.
+   */
+  private Utils() {
+    // nothing
+  }
+
+  /**
+   * Encoding an integer into a variable-length encoding format. Synonymous to
+   * &lt;code&gt;Utils#writeVLong(out, n)&lt;/code&gt;.
+   * 
+   * @param out
+   *          output stream
+   * @param n
+   *          The integer to be encoded
+   * @throws IOException
+   * @see Utils#writeVLong(DataOutput, long)
+   */
+  public static void writeVInt(DataOutput out, int n) throws IOException {
+    writeVLong(out, n);
+  }
+
+  /**
+   * Encoding a Long integer into a variable-length encoding format.
+   * &lt;ul&gt;
+   * &lt;li&gt;if n in [-32, 127): encode in one byte with the actual value.
+   * Otherwise,
+   * &lt;li&gt;if n in [-20*2^8, 20*2^8): encode in two bytes: byte[0] = n/256 - 52;
+   * byte[1]=n&amp;0xff. Otherwise,
+   * &lt;li&gt;if n IN [-16*2^16, 16*2^16): encode in three bytes: byte[0]=n/2^16 -
+   * 88; byte[1]=(n&gt;&gt;8)&amp;0xff; byte[2]=n&amp;0xff. Otherwise,
+   * &lt;li&gt;if n in [-8*2^24, 8*2^24): encode in four bytes: byte[0]=n/2^24 - 112;
+   * byte[1] = (n&gt;&gt;16)&amp;0xff; byte[2] = (n&gt;&gt;8)&amp;0xff; byte[3]=n&amp;0xff. Otherwise:
+   * &lt;li&gt;if n in [-2^31, 2^31): encode in five bytes: byte[0]=-125; byte[1] =
+   * (n&gt;&gt;24)&amp;0xff; byte[2]=(n&gt;&gt;16)&amp;0xff; byte[3]=(n&gt;&gt;8)&amp;0xff; byte[4]=n&amp;0xff;
+   * &lt;li&gt;if n in [-2^39, 2^39): encode in six bytes: byte[0]=-124; byte[1] =
+   * (n&gt;&gt;32)&amp;0xff; byte[2]=(n&gt;&gt;24)&amp;0xff; byte[3]=(n&gt;&gt;16)&amp;0xff;
+   * byte[4]=(n&gt;&gt;8)&amp;0xff; byte[5]=n&amp;0xff
+   * &lt;li&gt;if n in [-2^47, 2^47): encode in seven bytes: byte[0]=-123; byte[1] =
+   * (n&gt;&gt;40)&amp;0xff; byte[2]=(n&gt;&gt;32)&amp;0xff; byte[3]=(n&gt;&gt;24)&amp;0xff;
+   * byte[4]=(n&gt;&gt;16)&amp;0xff; byte[5]=(n&gt;&gt;8)&amp;0xff; byte[6]=n&amp;0xff;
+   * &lt;li&gt;if n in [-2^55, 2^55): encode in eight bytes: byte[0]=-122; byte[1] =
+   * (n&gt;&gt;48)&amp;0xff; byte[2] = (n&gt;&gt;40)&amp;0xff; byte[3]=(n&gt;&gt;32)&amp;0xff;
+   * byte[4]=(n&gt;&gt;24)&amp;0xff; byte[5]=(n&gt;&gt;16)&amp;0xff; byte[6]=(n&gt;&gt;8)&amp;0xff;
+   * byte[7]=n&amp;0xff;
+   * &lt;li&gt;if n in [-2^63, 2^63): encode in nine bytes: byte[0]=-121; byte[1] =
+   * (n&gt;&gt;54)&amp;0xff; byte[2] = (n&gt;&gt;48)&amp;0xff; byte[3] = (n&gt;&gt;40)&amp;0xff;
+   * byte[4]=(n&gt;&gt;32)&amp;0xff; byte[5]=(n&gt;&gt;24)&amp;0xff; byte[6]=(n&gt;&gt;16)&amp;0xff;
+   * byte[7]=(n&gt;&gt;8)&amp;0xff; byte[8]=n&amp;0xff;
+   * &lt;/ul&gt;
+   * 
+   * @param out
+   *          output stream
+   * @param n
+   *          the integer number
+   * @throws IOException
+   */
+  @SuppressWarnings("fallthrough")
+  public static void writeVLong(DataOutput out, long n) throws IOException {
+    if ((n &lt; 128) &amp;&amp; (n &gt;= -32)) {
+      out.writeByte((int) n);
+      return;
+    }
+
+    long un = (n &lt; 0) ? ~n : n;
+    // how many bytes do we need to represent the number with sign bit?
+    int len = (Long.SIZE - Long.numberOfLeadingZeros(un)) / 8 + 1;
+    int firstByte = (int) (n &gt;&gt; ((len - 1) * 8));
+    switch (len) {
+      case 1:
+        // fall it through to firstByte==-1, len=2.
+        firstByte &gt;&gt;= 8;
+      case 2:
+        if ((firstByte &lt; 20) &amp;&amp; (firstByte &gt;= -20)) {
+          out.writeByte(firstByte - 52);
+          out.writeByte((int) n);
+          return;
+        }
+        // fall it through to firstByte==0/-1, len=3.
+        firstByte &gt;&gt;= 8;
+      case 3:
+        if ((firstByte &lt; 16) &amp;&amp; (firstByte &gt;= -16)) {
+          out.writeByte(firstByte - 88);
+          out.writeShort((int) n);
+          return;
+        }
+        // fall it through to firstByte==0/-1, len=4.
+        firstByte &gt;&gt;= 8;
+      case 4:
+        if ((firstByte &lt; 8) &amp;&amp; (firstByte &gt;= -8)) {
+          out.writeByte(firstByte - 112);
+          out.writeShort(((int) n) &gt;&gt;&gt; 8);
+          out.writeByte((int) n);
+          return;
+        }
+        out.writeByte(len - 129);
+        out.writeInt((int) n);
+        return;
+      case 5:
+        out.writeByte(len - 129);
+        out.writeInt((int) (n &gt;&gt;&gt; 8));
+        out.writeByte((int) n);
+        return;
+      case 6:
+        out.writeByte(len - 129);
+        out.writeInt((int) (n &gt;&gt;&gt; 16));
+        out.writeShort((int) n);
+        return;
+      case 7:
+        out.writeByte(len - 129);
+        out.writeInt((int) (n &gt;&gt;&gt; 24));
+        out.writeShort((int) (n &gt;&gt;&gt; 8));
+        out.writeByte((int) n);
+        return;
+      case 8:
+        out.writeByte(len - 129);
+        out.writeLong(n);
+        return;
+      default:
+        throw new RuntimeException("Internel error");
+    }
+  }
+
+  /**
+   * Decoding the variable-length integer. Synonymous to
+   * &lt;code&gt;(int)Utils#readVLong(in)&lt;/code&gt;.
+   * 
+   * @param in
+   *          input stream
+   * @return the decoded integer
+   * @throws IOException
+   * 
+   * @see Utils#readVLong(DataInput)
+   */
+  public static int readVInt(DataInput in) throws IOException {
+    long ret = readVLong(in);
+    if ((ret &gt; Integer.MAX_VALUE) || (ret &lt; Integer.MIN_VALUE)) {
+      throw new RuntimeException(
+          "Number too large to be represented as Integer");
+    }
+    return (int) ret;
+  }
+
+  /**
+   * Decoding the variable-length integer. Suppose the value of the first byte
+   * is FB, and the following bytes are NB[*].
+   * &lt;ul&gt;
+   * &lt;li&gt;if (FB &gt;= -32), return (long)FB;
+   * &lt;li&gt;if (FB in [-72, -33]), return (FB+52)&lt;&lt;8 + NB[0]&amp;0xff;
+   * &lt;li&gt;if (FB in [-104, -73]), return (FB+88)&lt;&lt;16 + (NB[0]&amp;0xff)&lt;&lt;8 +
+   * NB[1]&amp;0xff;
+   * &lt;li&gt;if (FB in [-120, -105]), return (FB+112)&lt;&lt;24 + (NB[0]&amp;0xff)&lt;&lt;16 +
+   * (NB[1]&amp;0xff)&lt;&lt;8 + NB[2]&amp;0xff;
+   * &lt;li&gt;if (FB in [-128, -121]), return interpret NB[FB+129] as a signed
+   * big-endian integer.
+   * 
+   * @param in
+   *          input stream
+   * @return the decoded long integer.
+   * @throws IOException
+   */
+
+  public static long readVLong(DataInput in) throws IOException {
+    int firstByte = in.readByte();
+    if (firstByte &gt;= -32) {
+      return firstByte;
+    }
+
+    switch ((firstByte + 128) / 8) {
+      case 11:
+      case 10:
+      case 9:
+      case 8:
+      case 7:
+        return ((firstByte + 52) &lt;&lt; 8) | in.readUnsignedByte();
+      case 6:
+      case 5:
+      case 4:
+      case 3:
+        return ((firstByte + 88) &lt;&lt; 16) | in.readUnsignedShort();
+      case 2:
+      case 1:
+        return ((firstByte + 112) &lt;&lt; 24) | (in.readUnsignedShort() &lt;&lt; 8)
+            | in.readUnsignedByte();
+      case 0:
+        int len = firstByte + 129;
+        switch (len) {
+          case 4:
+            return in.readInt();
+          case 5:
+            return ((long) in.readInt()) &lt;&lt; 8 | in.readUnsignedByte();
+          case 6:
+            return ((long) in.readInt()) &lt;&lt; 16 | in.readUnsignedShort();
+          case 7:
+            return ((long) in.readInt()) &lt;&lt; 24 | (in.readUnsignedShort() &lt;&lt; 8)
+                | in.readUnsignedByte();
+          case 8:
+            return in.readLong();
+          default:
+            throw new IOException("Corrupted VLong encoding");
+        }
+      default:
+        throw new RuntimeException("Internal error");
+    }
+  }
+
+  /**
+   * Write a String as a VInt n, followed by n Bytes as in Text format.
+   * 
+   * @param out
+   * @param s
+   * @throws IOException
+   */
+  public static void writeString(DataOutput out, String s) throws IOException {
+    if (s != null) {
+      Text text = new Text(s);
+      byte[] buffer = text.getBytes();
+      int len = text.getLength();
+      writeVInt(out, len);
+      out.write(buffer, 0, len);
+    } else {
+      writeVInt(out, -1);
+    }
+  }
+
+  /**
+   * Read a String as a VInt n, followed by n Bytes in Text format.
+   * 
+   * @param in
+   *          The input stream.
+   * @return The string
+   * @throws IOException
+   */
+  public static String readString(DataInput in) throws IOException {
+    int length = readVInt(in);
+    if (length == -1) return null;
+    byte[] buffer = new byte[length];
+    in.readFully(buffer);
+    return Text.decode(buffer);
+  }
+
+  /**
+   * A generic Version class. We suggest applications built on top of TFile use
+   * this class to maintain version information in their meta blocks.
+   * 
+   * A version number consists of a major version and a minor version. The
+   * suggested usage of major and minor version number is to increment major
+   * version number when the new storage format is not backward compatible, and
+   * increment the minor version otherwise.
+   */
+  public static final class Version implements Comparable&lt;Version&gt; {
+    private final short major;
+    private final short minor;
+
+    /**
+     * Construct the Version object by reading from the input stream.
+     * 
+     * @param in
+     *          input stream
+     * @throws IOException
+     */
+    public Version(DataInput in) throws IOException {
+      major = in.readShort();
+      minor = in.readShort();
+    }
+
+    /**
+     * Constructor.
+     * 
+     * @param major
+     *          major version.
+     * @param minor
+     *          minor version.
+     */
+    public Version(short major, short minor) {
+      this.major = major;
+      this.minor = minor;
+    }
+
+    /**
+     * Write the objec to a DataOutput. The serialized format of the Version is
+     * major version followed by minor version, both as big-endian short
+     * integers.
+     * 
+     * @param out
+     *          The DataOutput object.
+     * @throws IOException
+     */
+    public void write(DataOutput out) throws IOException {
+      out.writeShort(major);
+      out.writeShort(minor);
+    }
+
+    /**
+     * Get the major version.
+     * 
+     * @return Major version.
+     */
+    public int getMajor() {
+      return major;
+    }
+
+    /**
+     * Get the minor version.
+     * 
+     * @return The minor version.
+     */
+    public int getMinor() {
+      return minor;
+    }
+
+    /**
+     * Get the size of the serialized Version object.
+     * 
+     * @return serialized size of the version object.
+     */
+    public static int size() {
+      return (Short.SIZE + Short.SIZE) / Byte.SIZE;
+    }
+
+    /**
+     * Return a string representation of the version.
+     */
+    public String toString() {
+      return new StringBuilder("v").append(major).append(".").append(minor)
+          .toString();
+    }
+
+    /**
+     * Test compatibility.
+     * 
+     * @param other
+     *          The Version object to test compatibility with.
+     * @return true if both versions have the same major version number; false
+     *         otherwise.
+     */
+    public boolean compatibleWith(Version other) {
+      return major == other.major;
+    }
+
+    /**
+     * Compare this version with another version.
+     */
+    @Override
+    public int compareTo(Version that) {
+      if (major != that.major) {
+        return major - that.major;
+      }
+      return minor - that.minor;
+    }
+
+    @Override
+    public boolean equals(Object other) {
+      if (this == other) return true;
+      if (!(other instanceof Version)) return false;
+      return compareTo((Version) other) == 0;
+    }
+
+    @Override
+    public int hashCode() {
+      return (major &lt;&lt; 16 + minor);
+    }
+  }
+
+  /**
+   * Lower bound binary search. Find the index to the first element in the list
+   * that compares greater than or equal to key.
+   * 
+   * @param &lt;T&gt;
+   *          Type of the input key.
+   * @param list
+   *          The list
+   * @param key
+   *          The input key.
+   * @param cmp
+   *          Comparator for the key.
+   * @return The index to the desired element if it exists; or list.size()
+   *         otherwise.
+   */
+  public static &lt;T&gt; int lowerBound(List&lt;? extends T&gt; list, T key,
+      Comparator&lt;? super T&gt; cmp) {
+    int low = 0;
+    int high = list.size();
+
+    while (low &lt; high) {
+      int mid = (low + high) &gt;&gt;&gt; 1;
+      T midVal = list.get(mid);
+      int ret = cmp.compare(midVal, key);
+      if (ret &lt; 0)
+        low = mid + 1;
+      else high = mid;
+    }
+    return low;
+  }
+
+  /**
+   * Upper bound binary search. Find the index to the first element in the list
+   * that compares greater than the input key.
+   * 
+   * @param &lt;T&gt;
+   *          Type of the input key.
+   * @param list
+   *          The list
+   * @param key
+   *          The input key.
+   * @param cmp
+   *          Comparator for the key.
+   * @return The index to the desired element if it exists; or list.size()
+   *         otherwise.
+   */
+  public static &lt;T&gt; int upperBound(List&lt;? extends T&gt; list, T key,
+      Comparator&lt;? super T&gt; cmp) {
+    int low = 0;
+    int high = list.size();
+
+    while (low &lt; high) {
+      int mid = (low + high) &gt;&gt;&gt; 1;
+      T midVal = list.get(mid);
+      int ret = cmp.compare(midVal, key);
+      if (ret &lt;= 0)
+        low = mid + 1;
+      else high = mid;
+    }
+    return low;
+  }
+
+  /**
+   * Lower bound binary search. Find the index to the first element in the list
+   * that compares greater than or equal to key.
+   * 
+   * @param &lt;T&gt;
+   *          Type of the input key.
+   * @param list
+   *          The list
+   * @param key
+   *          The input key.
+   * @return The index to the desired element if it exists; or list.size()
+   *         otherwise.
+   */
+  public static &lt;T&gt; int lowerBound(List&lt;? extends Comparable&lt;? super T&gt;&gt; list,
+      T key) {
+    int low = 0;
+    int high = list.size();
+
+    while (low &lt; high) {
+      int mid = (low + high) &gt;&gt;&gt; 1;
+      Comparable&lt;? super T&gt; midVal = list.get(mid);
+      int ret = midVal.compareTo(key);
+      if (ret &lt; 0)
+        low = mid + 1;
+      else high = mid;
+    }
+    return low;
+  }
+
+  /**
+   * Upper bound binary search. Find the index to the first element in the list
+   * that compares greater than the input key.
+   * 
+   * @param &lt;T&gt;
+   *          Type of the input key.
+   * @param list
+   *          The list
+   * @param key
+   *          The input key.
+   * @return The index to the desired element if it exists; or list.size()
+   *         otherwise.
+   */
+  public static &lt;T&gt; int upperBound(List&lt;? extends Comparable&lt;? super T&gt;&gt; list,
+      T key) {
+    int low = 0;
+    int high = list.size();
+
+    while (low &lt; high) {
+      int mid = (low + high) &gt;&gt;&gt; 1;
+      Comparable&lt;? super T&gt; midVal = list.get(mid);
+      int ret = midVal.compareTo(key);
+      if (ret &lt;= 0)
+        low = mid + 1;
+      else high = mid;
+    }
+    return low;
+  }
+}

Added: hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/KVGenerator.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/KVGenerator.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/KVGenerator.java (added)
+++ hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/KVGenerator.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,105 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.io.file.tfile;
+
+import java.util.Random;
+
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.WritableComparator;
+import org.apache.hadoop.io.file.tfile.RandomDistribution.DiscreteRNG;
+
+/**
+ * Generate random &lt;key, value&gt; pairs.
+ */
+class KVGenerator {
+  private final Random random;
+  private final byte[][] dict;
+  private final boolean sorted;
+  private final DiscreteRNG keyLenRNG, valLenRNG;
+  private BytesWritable lastKey;
+  private static final int MIN_KEY_LEN = 4;
+  private final byte prefix[] = new byte[MIN_KEY_LEN];
+
+  public KVGenerator(Random random, boolean sorted, DiscreteRNG keyLenRNG,
+      DiscreteRNG valLenRNG, DiscreteRNG wordLenRNG, int dictSize) {
+    this.random = random;
+    dict = new byte[dictSize][];
+    this.sorted = sorted;
+    this.keyLenRNG = keyLenRNG;
+    this.valLenRNG = valLenRNG;
+    for (int i = 0; i &lt; dictSize; ++i) {
+      int wordLen = wordLenRNG.nextInt();
+      dict[i] = new byte[wordLen];
+      random.nextBytes(dict[i]);
+    }
+    lastKey = new BytesWritable();
+    fillKey(lastKey);
+  }
+  
+  private void fillKey(BytesWritable o) {
+    int len = keyLenRNG.nextInt();
+    if (len &lt; MIN_KEY_LEN) len = MIN_KEY_LEN;
+    o.setSize(len);
+    int n = MIN_KEY_LEN;
+    while (n &lt; len) {
+      byte[] word = dict[random.nextInt(dict.length)];
+      int l = Math.min(word.length, len - n);
+      System.arraycopy(word, 0, o.get(), n, l);
+      n += l;
+    }
+    if (sorted
+        &amp;&amp; WritableComparator.compareBytes(lastKey.get(), MIN_KEY_LEN, lastKey
+            .getSize()
+            - MIN_KEY_LEN, o.get(), MIN_KEY_LEN, o.getSize() - MIN_KEY_LEN) &gt; 0) {
+      incrementPrefix();
+    }
+
+    System.arraycopy(prefix, 0, o.get(), 0, MIN_KEY_LEN);
+    lastKey.set(o);
+  }
+
+  private void fillValue(BytesWritable o) {
+    int len = valLenRNG.nextInt();
+    o.setSize(len);
+    int n = 0;
+    while (n &lt; len) {
+      byte[] word = dict[random.nextInt(dict.length)];
+      int l = Math.min(word.length, len - n);
+      System.arraycopy(word, 0, o.get(), n, l);
+      n += l;
+    }
+  }
+  
+  private void incrementPrefix() {
+    for (int i = MIN_KEY_LEN - 1; i &gt;= 0; --i) {
+      ++prefix[i];
+      if (prefix[i] != 0) return;
+    }
+    
+    throw new RuntimeException("Prefix overflown");
+  }
+  
+  public void next(BytesWritable key, BytesWritable value, boolean dupKey) {
+    if (dupKey) {
+      key.set(lastKey);
+    }
+    else {
+      fillKey(key);
+    }
+    fillValue(value);
+  }
+}

Added: hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/KeySampler.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/KeySampler.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/KeySampler.java (added)
+++ hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/KeySampler.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,56 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.IOException;
+import java.util.Random;
+
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.file.tfile.RandomDistribution.DiscreteRNG;
+
+class KeySampler {
+  Random random;
+  int min, max;
+  DiscreteRNG keyLenRNG;
+  private static final int MIN_KEY_LEN = 4;
+
+  public KeySampler(Random random, RawComparable first, RawComparable last,
+      DiscreteRNG keyLenRNG) throws IOException {
+    this.random = random;
+    min = keyPrefixToInt(first);
+    max = keyPrefixToInt(last);
+    this.keyLenRNG = keyLenRNG;
+  }
+
+  private int keyPrefixToInt(RawComparable key) throws IOException {
+    byte[] b = key.buffer();
+    int o = key.offset();
+    return (b[o] &amp; 0xff) &lt;&lt; 24 | (b[o + 1] &amp; 0xff) &lt;&lt; 16
+        | (b[o + 2] &amp; 0xff) &lt;&lt; 8 | (b[o + 3] &amp; 0xff);
+  }
+  
+  public void next(BytesWritable key) {
+    key.setSize(Math.max(MIN_KEY_LEN, keyLenRNG.nextInt()));
+    random.nextBytes(key.get());
+    int n = random.nextInt(max - min) + min;
+    byte[] b = key.get();
+    b[0] = (byte) (n &gt;&gt; 24);
+    b[1] = (byte) (n &gt;&gt; 16);
+    b[2] = (byte) (n &gt;&gt; 8);
+    b[3] = (byte) n;
+  }
+}

Added: hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/NanoTimer.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/NanoTimer.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/NanoTimer.java (added)
+++ hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/NanoTimer.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,193 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.io.file.tfile;
+
+/**
+ * A nano-second timer.
+ */
+public class NanoTimer {
+  private long last = -1;
+  private boolean started = false;
+  private long cumulate = 0;
+
+  /**
+   * Constructor
+   * 
+   * @param start
+   *          Start the timer upon construction.
+   */
+  public NanoTimer(boolean start) {
+    if (start) this.start();
+  }
+
+  /**
+   * Start the timer.
+   * 
+   * Note: No effect if timer is already started.
+   */
+  public void start() {
+    if (!this.started) {
+      this.last = System.nanoTime();
+      this.started = true;
+    }
+  }
+
+  /**
+   * Stop the timer.
+   * 
+   * Note: No effect if timer is already stopped.
+   */
+  public void stop() {
+    if (this.started) {
+      this.started = false;
+      this.cumulate += System.nanoTime() - this.last;
+    }
+  }
+
+  /**
+   * Read the timer.
+   * 
+   * @return the elapsed time in nano-seconds. Note: If the timer is never
+   *         started before, -1 is returned.
+   */
+  public long read() {
+    if (!readable()) return -1;
+
+    return this.cumulate;
+  }
+
+  /**
+   * Reset the timer.
+   */
+  public void reset() {
+    this.last = -1;
+    this.started = false;
+    this.cumulate = 0;
+  }
+
+  /**
+   * Checking whether the timer is started
+   * 
+   * @return true if timer is started.
+   */
+  public boolean isStarted() {
+    return this.started;
+  }
+
+  /**
+   * Format the elapsed time to a human understandable string.
+   * 
+   * Note: If timer is never started, "ERR" will be returned.
+   */
+  public String toString() {
+    if (!readable()) {
+      return "ERR";
+    }
+
+    return NanoTimer.nanoTimeToString(this.cumulate);
+  }
+
+  /**
+   * A utility method to format a time duration in nano seconds into a human
+   * understandable stirng.
+   * 
+   * @param t
+   *          Time duration in nano seconds.
+   * @return String representation.
+   */
+  public static String nanoTimeToString(long t) {
+    if (t &lt; 0) return "ERR";
+
+    if (t == 0) return "0";
+
+    if (t &lt; 1000) {
+      return t + "ns";
+    }
+
+    double us = (double) t / 1000;
+    if (us &lt; 1000) {
+      return String.format("%.2fus", us);
+    }
+
+    double ms = us / 1000;
+    if (ms &lt; 1000) {
+      return String.format("%.2fms", ms);
+    }
+
+    double ss = ms / 1000;
+    if (ss &lt; 1000) {
+      return String.format("%.2fs", ss);
+    }
+
+    long mm = (long) ss / 60;
+    ss -= mm * 60;
+    long hh = mm / 60;
+    mm -= hh * 60;
+    long dd = hh / 24;
+    hh -= dd * 24;
+
+    if (dd &gt; 0) {
+      return String.format("%dd%dh", dd, hh);
+    }
+
+    if (hh &gt; 0) {
+      return String.format("%dh%dm", hh, mm);
+    }
+
+    if (mm &gt; 0) {
+      return String.format("%dm%.1fs", mm, ss);
+    }
+
+    return String.format("%.2fs", ss);
+
+    /**
+     * StringBuilder sb = new StringBuilder(); String sep = "";
+     * 
+     * if (dd &gt; 0) { String unit = (dd &gt; 1) ? "days" : "day";
+     * sb.append(String.format("%s%d%s", sep, dd, unit)); sep = " "; }
+     * 
+     * if (hh &gt; 0) { String unit = (hh &gt; 1) ? "hrs" : "hr";
+     * sb.append(String.format("%s%d%s", sep, hh, unit)); sep = " "; }
+     * 
+     * if (mm &gt; 0) { String unit = (mm &gt; 1) ? "mins" : "min";
+     * sb.append(String.format("%s%d%s", sep, mm, unit)); sep = " "; }
+     * 
+     * if (ss &gt; 0) { String unit = (ss &gt; 1) ? "secs" : "sec";
+     * sb.append(String.format("%s%.3f%s", sep, ss, unit)); sep = " "; }
+     * 
+     * return sb.toString();
+     */
+  }
+
+  private boolean readable() {
+    return this.last != -1;
+  }
+
+  /**
+   * Simple tester.
+   * 
+   * @param args
+   */
+  public static void main(String[] args) {
+    long i = 7;
+
+    for (int x = 0; x &lt; 20; ++x, i *= 7) {
+      System.out.println(NanoTimer.nanoTimeToString(i));
+    }
+  }
+}
+

Added: hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/RandomDistribution.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/RandomDistribution.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/RandomDistribution.java (added)
+++ hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/RandomDistribution.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,266 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.io.file.tfile;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Random;
+
+/**
+ * A class that generates random numbers that follow some distribution.
+ */
+public class RandomDistribution {
+  /**
+   * Interface for discrete (integer) random distributions.
+   */
+  public static interface DiscreteRNG {
+    /**
+     * Get the next random number
+     * 
+     * @return the next random number.
+     */
+    public int nextInt();
+  }
+
+  /**
+   * P(i)=1/(max-min)
+   */
+  public static final class Flat implements DiscreteRNG {
+    private final Random random;
+    private final int min;
+    private final int max;
+
+    /**
+     * Generate random integers from min (inclusive) to max (exclusive)
+     * following even distribution.
+     * 
+     * @param random
+     *          The basic random number generator.
+     * @param min
+     *          Minimum integer
+     * @param max
+     *          maximum integer (exclusive).
+     * 
+     */
+    public Flat(Random random, int min, int max) {
+      if (min &gt;= max) {
+        throw new IllegalArgumentException("Invalid range");
+      }
+      this.random = random;
+      this.min = min;
+      this.max = max;
+    }
+    
+    /**
+     * @see DiscreteRNG#nextInt()
+     */
+    @Override
+    public int nextInt() {
+      return random.nextInt(max - min) + min;
+    }
+  }
+
+  /**
+   * Zipf distribution. The ratio of the probabilities of integer i and j is
+   * defined as follows:
+   * 
+   * P(i)/P(j)=((j-min+1)/(i-min+1))^sigma.
+   */
+  public static final class Zipf implements DiscreteRNG {
+    private static final double DEFAULT_EPSILON = 0.001;
+    private final Random random;
+    private final ArrayList&lt;Integer&gt; k;
+    private final ArrayList&lt;Double&gt; v;
+
+    /**
+     * Constructor
+     * 
+     * @param r
+     *          The random number generator.
+     * @param min
+     *          minimum integer (inclusvie)
+     * @param max
+     *          maximum integer (exclusive)
+     * @param sigma
+     *          parameter sigma. (sigma &gt; 1.0)
+     */
+    public Zipf(Random r, int min, int max, double sigma) {
+      this(r, min, max, sigma, DEFAULT_EPSILON);
+    }
+
+    /**
+     * Constructor.
+     * 
+     * @param r
+     *          The random number generator.
+     * @param min
+     *          minimum integer (inclusvie)
+     * @param max
+     *          maximum integer (exclusive)
+     * @param sigma
+     *          parameter sigma. (sigma &gt; 1.0)
+     * @param epsilon
+     *          Allowable error percentage (0 &lt; epsilon &lt; 1.0).
+     */
+    public Zipf(Random r, int min, int max, double sigma, double epsilon) {
+      if ((max &lt;= min) || (sigma &lt;= 1) || (epsilon &lt;= 0)
+          || (epsilon &gt;= 0.5)) {
+        throw new IllegalArgumentException("Invalid arguments");
+      }
+      random = r;
+      k = new ArrayList&lt;Integer&gt;();
+      v = new ArrayList&lt;Double&gt;();
+
+      double sum = 0;
+      int last = -1;
+      for (int i = min; i &lt; max; ++i) {
+        sum += Math.exp(-sigma * Math.log(i - min + 1));
+        if ((last == -1) || i * (1 - epsilon) &gt; last) {
+          k.add(i);
+          v.add(sum);
+          last = i;
+        }
+      }
+
+      if (last != max - 1) {
+        k.add(max - 1);
+        v.add(sum);
+      }
+
+      v.set(v.size() - 1, 1.0);
+
+      for (int i = v.size() - 2; i &gt;= 0; --i) {
+        v.set(i, v.get(i) / sum);
+      }
+    }
+
+    /**
+     * @see DiscreteRNG#nextInt()
+     */
+    @Override
+    public int nextInt() {
+      double d = random.nextDouble();
+      int idx = Collections.binarySearch(v, d);
+
+      if (idx &gt; 0) {
+        ++idx;
+      }
+      else {
+        idx = -(idx + 1);
+      }
+
+      if (idx &gt;= v.size()) {
+        idx = v.size() - 1;
+      }
+
+      if (idx == 0) {
+        return k.get(0);
+      }
+
+      int ceiling = k.get(idx);
+      int lower = k.get(idx - 1);
+
+      return ceiling - random.nextInt(ceiling - lower);
+    }
+  }
+
+  /**
+   * Binomial distribution.
+   * 
+   * P(k)=select(n, k)*p^k*(1-p)^(n-k) (k = 0, 1, ..., n)
+   * 
+   * P(k)=select(max-min-1, k-min)*p^(k-min)*(1-p)^(k-min)*(1-p)^(max-k-1)
+   */
+  public static final class Binomial implements DiscreteRNG {
+    private final Random random;
+    private final int min;
+    private final int n;
+    private final double[] v;
+
+    private static double select(int n, int k) {
+      double ret = 1.0;
+      for (int i = k + 1; i &lt;= n; ++i) {
+        ret *= (double) i / (i - k);
+      }
+      return ret;
+    }
+    
+    private static double power(double p, int k) {
+      return Math.exp(k * Math.log(p));
+    }
+
+    /**
+     * Generate random integers from min (inclusive) to max (exclusive)
+     * following Binomial distribution.
+     * 
+     * @param random
+     *          The basic random number generator.
+     * @param min
+     *          Minimum integer
+     * @param max
+     *          maximum integer (exclusive).
+     * @param p
+     *          parameter.
+     * 
+     */
+    public Binomial(Random random, int min, int max, double p) {
+      if (min &gt;= max) {
+        throw new IllegalArgumentException("Invalid range");
+      }
+      this.random = random;
+      this.min = min;
+      this.n = max - min - 1;
+      if (n &gt; 0) {
+        v = new double[n + 1];
+        double sum = 0.0;
+        for (int i = 0; i &lt;= n; ++i) {
+          sum += select(n, i) * power(p, i) * power(1 - p, n - i);
+          v[i] = sum;
+        }
+        for (int i = 0; i &lt;= n; ++i) {
+          v[i] /= sum;
+        }
+      }
+      else {
+        v = null;
+      }
+    }
+
+    /**
+     * @see DiscreteRNG#nextInt()
+     */
+    @Override
+    public int nextInt() {
+      if (v == null) {
+        return min;
+      }
+      double d = random.nextDouble();
+      int idx = Arrays.binarySearch(v, d);
+      if (idx &gt; 0) {
+        ++idx;
+      } else {
+        idx = -(idx + 1);
+      }
+
+      if (idx &gt;= v.length) {
+        idx = v.length - 1;
+      }
+      return idx + min;
+    }
+  }
+}

Added: hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFile.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFile.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFile.java (added)
+++ hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFile.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,431 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.Arrays;
+
+import junit.framework.TestCase;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.file.tfile.TFile.Reader;
+import org.apache.hadoop.io.file.tfile.TFile.Writer;
+import org.apache.hadoop.io.file.tfile.TFile.Reader.Scanner;
+
+/**
+ * test tfile features.
+ * 
+ */
+public class TestTFile extends TestCase {
+  private static String ROOT =
+      System.getProperty("test.build.data", "/tmp/tfile-test");
+  private FileSystem fs;
+  private Configuration conf;
+  private final int minBlockSize = 512;
+  private final int largeVal = 3 * 1024 * 1024;
+  private static String localFormatter = "%010d";
+
+  @Override
+  public void setUp() throws IOException {
+    conf = new Configuration();
+    fs = FileSystem.get(conf);
+  }
+
+  @Override
+  public void tearDown() throws IOException {
+    // do nothing
+  }
+
+  // read a key from the scanner
+  public byte[] readKey(Scanner scanner) throws IOException {
+    int keylen = scanner.entry().getKeyLength();
+    byte[] read = new byte[keylen];
+    scanner.entry().getKey(read);
+    return read;
+  }
+
+  // read a value from the scanner
+  public byte[] readValue(Scanner scanner) throws IOException {
+    int valueLen = scanner.entry().getValueLength();
+    byte[] read = new byte[valueLen];
+    scanner.entry().getValue(read);
+    return read;
+  }
+
+  // read a long value from the scanner
+  public byte[] readLongValue(Scanner scanner, int len) throws IOException {
+    DataInputStream din = scanner.entry().getValueStream();
+    byte[] b = new byte[len];
+    din.readFully(b);
+    din.close();
+    return b;
+  }
+
+  // write some records into the tfile
+  // write them twice
+  private int writeSomeRecords(Writer writer, int start, int n)
+      throws IOException {
+    String value = "value";
+    for (int i = start; i &lt; (start + n); i++) {
+      String key = String.format(localFormatter, i);
+      writer.append(key.getBytes(), (value + key).getBytes());
+      writer.append(key.getBytes(), (value + key).getBytes());
+    }
+    return (start + n);
+  }
+
+  // read the records and check
+  private int readAndCheckbytes(Scanner scanner, int start, int n)
+      throws IOException {
+    String value = "value";
+    for (int i = start; i &lt; (start + n); i++) {
+      byte[] key = readKey(scanner);
+      byte[] val = readValue(scanner);
+      String keyStr = String.format(localFormatter, i);
+      String valStr = value + keyStr;
+      assertTrue("btyes for keys do not match " + keyStr + " "
+          + new String(key), Arrays.equals(keyStr.getBytes(), key));
+      assertTrue("bytes for vals do not match " + valStr + " "
+          + new String(val), Arrays.equals(
+          valStr.getBytes(), val));
+      assertTrue(scanner.advance());
+      key = readKey(scanner);
+      val = readValue(scanner);
+      assertTrue("btyes for keys do not match", Arrays.equals(
+          keyStr.getBytes(), key));
+      assertTrue("bytes for vals do not match", Arrays.equals(
+          valStr.getBytes(), val));
+      assertTrue(scanner.advance());
+    }
+    return (start + n);
+  }
+
+  // write some large records
+  // write them twice
+  private int writeLargeRecords(Writer writer, int start, int n)
+      throws IOException {
+    byte[] value = new byte[largeVal];
+    for (int i = start; i &lt; (start + n); i++) {
+      String key = String.format(localFormatter, i);
+      writer.append(key.getBytes(), value);
+      writer.append(key.getBytes(), value);
+    }
+    return (start + n);
+  }
+
+  // read large records
+  // read them twice since its duplicated
+  private int readLargeRecords(Scanner scanner, int start, int n)
+      throws IOException {
+    for (int i = start; i &lt; (start + n); i++) {
+      byte[] key = readKey(scanner);
+      String keyStr = String.format(localFormatter, i);
+      assertTrue("btyes for keys do not match", Arrays.equals(
+          keyStr.getBytes(), key));
+      scanner.advance();
+      key = readKey(scanner);
+      assertTrue("btyes for keys do not match", Arrays.equals(
+          keyStr.getBytes(), key));
+      scanner.advance();
+    }
+    return (start + n);
+  }
+
+  // write empty keys and values
+  private void writeEmptyRecords(Writer writer, int n) throws IOException {
+    byte[] key = new byte[0];
+    byte[] value = new byte[0];
+    for (int i = 0; i &lt; n; i++) {
+      writer.append(key, value);
+    }
+  }
+
+  // read empty keys and values
+  private void readEmptyRecords(Scanner scanner, int n) throws IOException {
+    byte[] key = new byte[0];
+    byte[] value = new byte[0];
+    byte[] readKey = null;
+    byte[] readValue = null;
+    for (int i = 0; i &lt; n; i++) {
+      readKey = readKey(scanner);
+      readValue = readValue(scanner);
+      assertTrue("failed to match keys", Arrays.equals(readKey, key));
+      assertTrue("failed to match values", Arrays.equals(readValue, value));
+      assertTrue("failed to advance cursor", scanner.advance());
+    }
+  }
+
+  private int writePrepWithKnownLength(Writer writer, int start, int n)
+      throws IOException {
+    // get the length of the key
+    String key = String.format(localFormatter, start);
+    int keyLen = key.getBytes().length;
+    String value = "value" + key;
+    int valueLen = value.getBytes().length;
+    for (int i = start; i &lt; (start + n); i++) {
+      DataOutputStream out = writer.prepareAppendKey(keyLen);
+      String localKey = String.format(localFormatter, i);
+      out.write(localKey.getBytes());
+      out.close();
+      out = writer.prepareAppendValue(valueLen);
+      String localValue = "value" + localKey;
+      out.write(localValue.getBytes());
+      out.close();
+    }
+    return (start + n);
+  }
+
+  private int readPrepWithKnownLength(Scanner scanner, int start, int n)
+      throws IOException {
+    for (int i = start; i &lt; (start + n); i++) {
+      String key = String.format(localFormatter, i);
+      byte[] read = readKey(scanner);
+      assertTrue("keys not equal", Arrays.equals(key.getBytes(), read));
+      String value = "value" + key;
+      read = readValue(scanner);
+      assertTrue("values not equal", Arrays.equals(value.getBytes(), read));
+      scanner.advance();
+    }
+    return (start + n);
+  }
+
+  private int writePrepWithUnkownLength(Writer writer, int start, int n)
+      throws IOException {
+    for (int i = start; i &lt; (start + n); i++) {
+      DataOutputStream out = writer.prepareAppendKey(-1);
+      String localKey = String.format(localFormatter, i);
+      out.write(localKey.getBytes());
+      out.close();
+      String value = "value" + localKey;
+      out = writer.prepareAppendValue(-1);
+      out.write(value.getBytes());
+      out.close();
+    }
+    return (start + n);
+  }
+
+  private int readPrepWithUnknownLength(Scanner scanner, int start, int n)
+      throws IOException {
+    for (int i = start; i &lt; start; i++) {
+      String key = String.format(localFormatter, i);
+      byte[] read = readKey(scanner);
+      assertTrue("keys not equal", Arrays.equals(key.getBytes(), read));
+      try {
+        read = readValue(scanner);
+        assertTrue(false);
+      }
+      catch (IOException ie) {
+        // should have thrown exception
+      }
+      String value = "value" + key;
+      read = readLongValue(scanner, value.getBytes().length);
+      assertTrue("values nto equal", Arrays.equals(read, value.getBytes()));
+      scanner.advance();
+    }
+    return (start + n);
+  }
+
+  private byte[] getSomeKey(int rowId) {
+    return String.format(localFormatter, rowId).getBytes();
+  }
+
+  private void writeRecords(Writer writer) throws IOException {
+    writeEmptyRecords(writer, 10);
+    int ret = writeSomeRecords(writer, 0, 100);
+    ret = writeLargeRecords(writer, ret, 1);
+    ret = writePrepWithKnownLength(writer, ret, 40);
+    ret = writePrepWithUnkownLength(writer, ret, 50);
+    writer.close();
+  }
+
+  private void readAllRecords(Scanner scanner) throws IOException {
+    readEmptyRecords(scanner, 10);
+    int ret = readAndCheckbytes(scanner, 0, 100);
+    ret = readLargeRecords(scanner, ret, 1);
+    ret = readPrepWithKnownLength(scanner, ret, 40);
+    ret = readPrepWithUnknownLength(scanner, ret, 50);
+  }
+
+  private FSDataOutputStream createFSOutput(Path name) throws IOException {
+    if (fs.exists(name)) fs.delete(name, true);
+    FSDataOutputStream fout = fs.create(name);
+    return fout;
+  }
+
+  /**
+   * test none codecs
+   */
+  void basicWithSomeCodec(String codec) throws IOException {
+    Path ncTFile = new Path(ROOT, "basic.tfile");
+    FSDataOutputStream fout = createFSOutput(ncTFile);
+    Writer writer = new Writer(fout, minBlockSize, codec, "memcmp", conf);
+    writeRecords(writer);
+    fout.close();
+    FSDataInputStream fin = fs.open(ncTFile);
+    Reader reader =
+        new Reader(fs.open(ncTFile), fs.getFileStatus(ncTFile).getLen(), conf);
+
+    Scanner scanner = reader.createScanner();
+    readAllRecords(scanner);
+    scanner.seekTo(getSomeKey(50));
+    assertTrue("location lookup failed", scanner.seekTo(getSomeKey(50)));
+    // read the key and see if it matches
+    byte[] readKey = readKey(scanner);
+    assertTrue("seeked key does not match", Arrays.equals(getSomeKey(50),
+        readKey));
+
+    scanner.seekTo(new byte[0]);
+    byte[] val1 = readValue(scanner);
+    scanner.seekTo(new byte[0]);
+    byte[] val2 = readValue(scanner);
+    assertTrue(Arrays.equals(val1, val2));
+    
+    // check for lowerBound
+    scanner.lowerBound(getSomeKey(50));
+    assertTrue("locaton lookup failed", scanner.currentLocation
+        .compareTo(reader.end()) &lt; 0);
+    readKey = readKey(scanner);
+    assertTrue("seeked key does not match", Arrays.equals(readKey,
+        getSomeKey(50)));
+
+    // check for upper bound
+    scanner.upperBound(getSomeKey(50));
+    assertTrue("location lookup failed", scanner.currentLocation
+        .compareTo(reader.end()) &lt; 0);
+    readKey = readKey(scanner);
+    assertTrue("seeked key does not match", Arrays.equals(readKey,
+        getSomeKey(51)));
+
+    scanner.close();
+    // test for a range of scanner
+    scanner = reader.createScanner(getSomeKey(10), getSomeKey(60));
+    readAndCheckbytes(scanner, 10, 50);
+    assertFalse(scanner.advance());
+    scanner.close();
+    reader.close();
+    fin.close();
+    fs.delete(ncTFile, true);
+  }
+
+  // unsorted with some codec
+  void unsortedWithSomeCodec(String codec) throws IOException {
+    Path uTfile = new Path(ROOT, "unsorted.tfile");
+    FSDataOutputStream fout = createFSOutput(uTfile);
+    Writer writer = new Writer(fout, minBlockSize, codec, null, conf);
+    writeRecords(writer);
+    writer.close();
+    fout.close();
+    FSDataInputStream fin = fs.open(uTfile);
+    Reader reader =
+        new Reader(fs.open(uTfile), fs.getFileStatus(uTfile).getLen(), conf);
+
+    Scanner scanner = reader.createScanner();
+    readAllRecords(scanner);
+    scanner.close();
+    reader.close();
+    fin.close();
+    fs.delete(uTfile, true);
+  }
+
+  public void testTFileFeatures() throws IOException {
+    basicWithSomeCodec("none");
+    basicWithSomeCodec("gz");
+  }
+
+  // test unsorted t files.
+  public void testUnsortedTFileFeatures() throws IOException {
+    unsortedWithSomeCodec("none");
+    unsortedWithSomeCodec("gz");
+  }
+
+  private void writeNumMetablocks(Writer writer, String compression, int n)
+      throws IOException {
+    for (int i = 0; i &lt; n; i++) {
+      DataOutputStream dout =
+          writer.prepareMetaBlock("TfileMeta" + i, compression);
+      byte[] b = ("something to test" + i).getBytes();
+      dout.write(b);
+      dout.close();
+    }
+  }
+
+  private void someTestingWithMetaBlock(Writer writer, String compression)
+      throws IOException {
+    DataOutputStream dout = null;
+    writeNumMetablocks(writer, compression, 10);
+    try {
+      dout = writer.prepareMetaBlock("TfileMeta1", compression);
+      assertTrue(false);
+    }
+    catch (MetaBlockAlreadyExists me) {
+      // avoid this exception
+    }
+    dout = writer.prepareMetaBlock("TFileMeta100", compression);
+    dout.close();
+  }
+
+  private void readNumMetablocks(Reader reader, int n) throws IOException {
+    int len = ("something to test" + 0).getBytes().length;
+    for (int i = 0; i &lt; n; i++) {
+      DataInputStream din = reader.getMetaBlock("TfileMeta" + i);
+      byte b[] = new byte[len];
+      din.readFully(b);
+      assertTrue("faield to match metadata", Arrays.equals(
+          ("something to test" + i).getBytes(), b));
+      din.close();
+    }
+  }
+
+  private void someReadingWithMetaBlock(Reader reader) throws IOException {
+    DataInputStream din = null;
+    readNumMetablocks(reader, 10);
+    try {
+      din = reader.getMetaBlock("NO ONE");
+      assertTrue(false);
+    }
+    catch (MetaBlockDoesNotExist me) {
+      // should catch
+    }
+    din = reader.getMetaBlock("TFileMeta100");
+    int read = din.read();
+    assertTrue("check for status", (read == -1));
+    din.close();
+  }
+
+  // test meta blocks for tfiles
+  public void testMetaBlocks() throws IOException {
+    Path mFile = new Path(ROOT, "meta.tfile");
+    FSDataOutputStream fout = createFSOutput(mFile);
+    Writer writer = new Writer(fout, minBlockSize, "none", null, conf);
+    someTestingWithMetaBlock(writer, "none");
+    writer.close();
+    fout.close();
+    FSDataInputStream fin = fs.open(mFile);
+    Reader reader = new Reader(fin, fs.getFileStatus(mFile).getLen(), conf);
+    someReadingWithMetaBlock(reader);
+    fs.delete(mFile, true);
+    reader.close();
+    fin.close();
+  }
+}

Added: hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileByteArrays.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileByteArrays.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileByteArrays.java (added)
+++ hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileByteArrays.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,790 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.EOFException;
+import java.io.IOException;
+import java.util.Random;
+
+import junit.framework.Assert;
+import junit.framework.TestCase;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.WritableUtils;
+import org.apache.hadoop.io.file.tfile.TFile.Reader;
+import org.apache.hadoop.io.file.tfile.TFile.Writer;
+import org.apache.hadoop.io.file.tfile.TFile.Reader.Location;
+import org.apache.hadoop.io.file.tfile.TFile.Reader.Scanner;
+
+/**
+ * 
+ * Byte arrays test case class using GZ compression codec, base class of none
+ * and LZO compression classes.
+ * 
+ */
+public class TestTFileByteArrays extends TestCase {
+  private static String ROOT =
+      System.getProperty("test.build.data", "/tmp/tfile-test");
+  private final static int BLOCK_SIZE = 512;
+  private final static int BUF_SIZE = 64;
+  private final static int K = 1024;
+  protected boolean skip = false;
+
+  private static final String KEY = "key";
+  private static final String VALUE = "value";
+
+  private FileSystem fs;
+  private Configuration conf;
+  private Path path;
+  private FSDataOutputStream out;
+  private Writer writer;
+
+  private String compression = Compression.Algorithm.GZ.getName();
+  private String comparator = "memcmp";
+  private String outputFile = "TFileTestByteArrays";
+  /*
+   * pre-sampled numbers of records in one block, based on the given the
+   * generated key and value strings
+   */
+  // private int records1stBlock = 4314;
+  // private int records2ndBlock = 4108;
+  private int records1stBlock = 4480;
+  private int records2ndBlock = 4263;
+
+  public void init(String compression, String comparator, String outputFile,
+      int numRecords1stBlock, int numRecords2ndBlock) {
+    this.compression = compression;
+    this.comparator = comparator;
+    this.outputFile = outputFile;
+    this.records1stBlock = numRecords1stBlock;
+    this.records2ndBlock = numRecords2ndBlock;
+  }
+
+  @Override
+  public void setUp() throws IOException {
+    conf = new Configuration();
+    path = new Path(ROOT, outputFile);
+    fs = path.getFileSystem(conf);
+    out = fs.create(path);
+    writer = new Writer(out, BLOCK_SIZE, compression, comparator, conf);
+  }
+
+  @Override
+  public void tearDown() throws IOException {
+    if (!skip)
+    fs.delete(path, true);
+  }
+
+  public void testNoDataEntry() throws IOException {
+    if (skip) 
+      return;
+    closeOutput();
+
+    Reader reader = new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+    Assert.assertTrue(reader.isSorted());
+    Scanner scanner = reader.createScanner();
+    Assert.assertTrue(scanner.atEnd());
+    scanner.close();
+    reader.close();
+  }
+
+  public void testOneDataEntry() throws IOException {
+    if (skip)
+      return;
+    writeRecords(1);
+    readRecords(1);
+
+    checkBlockIndex(1, 0, 0);
+    readValueBeforeKey(1, 0);
+    readKeyWithoutValue(1, 0);
+    readValueWithoutKey(1, 0);
+    readKeyManyTimes(1, 0);
+  }
+
+  public void testTwoDataEntries() throws IOException {
+    if (skip)
+      return;
+    writeRecords(2);
+    readRecords(2);
+  }
+
+  /**
+   * Fill up exactly one block.
+   * 
+   * @throws IOException
+   */
+  public void testOneBlock() throws IOException {
+    if (skip)
+      return;
+    // just under one block
+    writeRecords(records1stBlock);
+    readRecords(records1stBlock);
+    // last key should be in the first block (block 0)
+    checkBlockIndex(records1stBlock, records1stBlock - 1, 0);
+  }
+
+  /**
+   * One block plus one record.
+   * 
+   * @throws IOException
+   */
+  public void testOneBlockPlusOneEntry() throws IOException {
+    if (skip)
+      return;
+    writeRecords(records1stBlock + 1);
+    readRecords(records1stBlock + 1);
+    checkBlockIndex(records1stBlock + 1, records1stBlock - 1, 0);
+    checkBlockIndex(records1stBlock + 1, records1stBlock, 1);
+  }
+
+  public void testTwoBlocks() throws IOException {
+    if (skip)
+      return;
+    writeRecords(records1stBlock + 5);
+    readRecords(records1stBlock + 5);
+    checkBlockIndex(records1stBlock + 5, records1stBlock + 4, 1);
+  }
+
+  public void testThreeBlocks() throws IOException {
+    if (skip) 
+      return;
+    writeRecords(2 * records1stBlock + 5);
+    readRecords(2 * records1stBlock + 5);
+
+    checkBlockIndex(2 * records1stBlock + 5, 2 * records1stBlock + 4, 2);
+    // 1st key in file
+    readValueBeforeKey(2 * records1stBlock + 5, 0);
+    readKeyWithoutValue(2 * records1stBlock + 5, 0);
+    readValueWithoutKey(2 * records1stBlock + 5, 0);
+    readKeyManyTimes(2 * records1stBlock + 5, 0);
+    // last key in file
+    readValueBeforeKey(2 * records1stBlock + 5, 2 * records1stBlock + 4);
+    readKeyWithoutValue(2 * records1stBlock + 5, 2 * records1stBlock + 4);
+    readValueWithoutKey(2 * records1stBlock + 5, 2 * records1stBlock + 4);
+    readKeyManyTimes(2 * records1stBlock + 5, 2 * records1stBlock + 4);
+
+    // 1st key in mid block, verify block indexes then read
+    checkBlockIndex(2 * records1stBlock + 5, records1stBlock - 1, 0);
+    checkBlockIndex(2 * records1stBlock + 5, records1stBlock, 1);
+    readValueBeforeKey(2 * records1stBlock + 5, records1stBlock);
+    readKeyWithoutValue(2 * records1stBlock + 5, records1stBlock);
+    readValueWithoutKey(2 * records1stBlock + 5, records1stBlock);
+    readKeyManyTimes(2 * records1stBlock + 5, records1stBlock);
+
+    // last key in mid block, verify block indexes then read
+    checkBlockIndex(2 * records1stBlock + 5, records1stBlock + records2ndBlock
+        - 1, 1);
+    checkBlockIndex(2 * records1stBlock + 5, records1stBlock + records2ndBlock,
+        2);
+    readValueBeforeKey(2 * records1stBlock + 5, records1stBlock
+        + records2ndBlock - 1);
+    readKeyWithoutValue(2 * records1stBlock + 5, records1stBlock
+        + records2ndBlock - 1);
+    readValueWithoutKey(2 * records1stBlock + 5, records1stBlock
+        + records2ndBlock - 1);
+    readKeyManyTimes(2 * records1stBlock + 5, records1stBlock + records2ndBlock
+        - 1);
+
+    // mid in mid block
+    readValueBeforeKey(2 * records1stBlock + 5, records1stBlock + 10);
+    readKeyWithoutValue(2 * records1stBlock + 5, records1stBlock + 10);
+    readValueWithoutKey(2 * records1stBlock + 5, records1stBlock + 10);
+    readKeyManyTimes(2 * records1stBlock + 5, records1stBlock + 10);
+  }
+
+  Location locate(Scanner scanner, byte[] key) throws IOException {
+    if (scanner.seekTo(key) == true) {
+      return scanner.currentLocation;
+    }
+    return scanner.endLocation;
+  }
+  
+  public void testLocate() throws IOException {
+    if (skip)
+      return;
+    writeRecords(3 * records1stBlock);
+    Reader reader = new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+    Scanner scanner = reader.createScanner();
+    Location loc2 =
+        locate(scanner, composeSortedKey(KEY, 3 * records1stBlock, 2)
+            .getBytes());
+    Location locLastIn1stBlock =
+        locate(scanner, composeSortedKey(KEY, 3 * records1stBlock,
+            records1stBlock - 1).getBytes());
+    Location locFirstIn2ndBlock =
+        locate(scanner, composeSortedKey(KEY, 3 * records1stBlock,
+            records1stBlock).getBytes());
+    Location locX = locate(scanner, "keyX".getBytes());
+    Assert.assertEquals(scanner.endLocation, locX);
+    scanner.close();
+    reader.close();
+  }
+
+  public void testFailureWriterNotClosed() throws IOException {
+    if (skip)
+      return;
+    Reader reader = null;
+    try {
+      reader = new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+      Assert.fail("Cannot read before closing the writer.");
+    }
+    catch (IOException e) {
+      // noop, expecting exceptions
+    }
+    finally {
+      if (reader != null) {
+        reader.close();
+      }
+    }
+  }
+
+  public void testFailureWriteMetaBlocksWithSameName() throws IOException {
+    if (skip)
+      return;
+    writer.append("keyX".getBytes(), "valueX".getBytes());
+
+    // create a new metablock
+    DataOutputStream outMeta =
+        writer.prepareMetaBlock("testX", Compression.Algorithm.GZ.getName());
+    outMeta.write(123);
+    outMeta.write("foo".getBytes());
+    outMeta.close();
+    // add the same metablock
+    try {
+      DataOutputStream outMeta2 =
+          writer.prepareMetaBlock("testX", Compression.Algorithm.GZ.getName());
+      Assert.fail("Cannot create metablocks with the same name.");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+    }
+    closeOutput();
+  }
+
+  public void testFailureGetNonExistentMetaBlock() throws IOException {
+    if (skip)
+      return;
+    writer.append("keyX".getBytes(), "valueX".getBytes());
+
+    // create a new metablock
+    DataOutputStream outMeta =
+        writer.prepareMetaBlock("testX", Compression.Algorithm.GZ.getName());
+    outMeta.write(123);
+    outMeta.write("foo".getBytes());
+    outMeta.close();
+    closeOutput();
+
+    Reader reader = new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+    DataInputStream mb = reader.getMetaBlock("testX");
+    Assert.assertNotNull(mb);
+    mb.close();
+    try {
+      DataInputStream mbBad = reader.getMetaBlock("testY");
+      Assert.assertNull(mbBad);
+      Assert.fail("Error on handling non-existent metablocks.");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+    }
+    reader.close();
+  }
+
+  public void testFailureWriteRecordAfterMetaBlock() throws IOException {
+    if (skip)
+      return;
+    // write a key/value first
+    writer.append("keyX".getBytes(), "valueX".getBytes());
+    // create a new metablock
+    DataOutputStream outMeta =
+        writer.prepareMetaBlock("testX", Compression.Algorithm.GZ.getName());
+    outMeta.write(123);
+    outMeta.write("dummy".getBytes());
+    outMeta.close();
+    // add more key/value
+    try {
+      writer.append("keyY".getBytes(), "valueY".getBytes());
+      Assert.fail("Cannot add key/value after start adding meta blocks.");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+    }
+    closeOutput();
+  }
+
+  public void testFailureReadValueManyTimes() throws IOException {
+    if (skip)
+      return;
+    writeRecords(5);
+
+    Reader reader = new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+    Scanner scanner = reader.createScanner();
+
+    byte[] vbuf = new byte[BUF_SIZE];
+    int vlen = scanner.entry().getValueLength();
+    scanner.entry().getValue(vbuf);
+    Assert.assertEquals(new String(vbuf, 0, vlen), VALUE + 0);
+    try {
+      scanner.entry().getValue(vbuf);
+      Assert.fail("Cannot get the value mlutiple times.");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+    }
+
+    scanner.close();
+    reader.close();
+  }
+
+  public void testFailureBadCompressionCodec() throws IOException {
+    if (skip)
+      return;
+    closeOutput();
+    out = fs.create(path);
+    try {
+      writer = new Writer(out, BLOCK_SIZE, "BAD", comparator, conf);
+      Assert.fail("Error on handling invalid compression codecs.");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+      // e.printStackTrace();
+    }
+  }
+
+  public void testFailureOpenEmptyFile() throws IOException {
+    if (skip)
+      return;
+    closeOutput();
+    // create an absolutely empty file
+    path = new Path(fs.getWorkingDirectory(), outputFile);
+    out = fs.create(path);
+    out.close();
+    try {
+      Reader reader =
+          new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+      Assert.fail("Error on handling empty files.");
+    }
+    catch (EOFException e) {
+      // noop, expecting exceptions
+    }
+  }
+
+  public void testFailureOpenRandomFile() throws IOException {
+    if (skip)
+      return;
+    closeOutput();
+    // create an random file
+    path = new Path(fs.getWorkingDirectory(), outputFile);
+    out = fs.create(path);
+    Random rand = new Random();
+    byte[] buf = new byte[K];
+    // fill with &gt; 1MB data
+    for (int nx = 0; nx &lt; K + 2; nx++) {
+      rand.nextBytes(buf);
+      out.write(buf);
+    }
+    out.close();
+    try {
+      Reader reader =
+          new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+      Assert.fail("Error on handling random files.");
+    }
+    catch (IOException e) {
+      // noop, expecting exceptions
+    }
+  }
+
+  public void testFailureKeyLongerThan64K() throws IOException {
+    if (skip)
+      return;
+    byte[] buf = new byte[64 * K + 1];
+    Random rand = new Random();
+    rand.nextBytes(buf);
+    try {
+      writer.append(buf, "valueX".getBytes());
+    }
+    catch (IndexOutOfBoundsException e) {
+      // noop, expecting exceptions
+    }
+    closeOutput();
+  }
+
+  public void testFailureOutOfOrderKeys() throws IOException {
+    if (skip)
+      return;
+    try {
+      writer.append("keyM".getBytes(), "valueM".getBytes());
+      writer.append("keyA".getBytes(), "valueA".getBytes());
+      Assert.fail("Error on handling out of order keys.");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+      // e.printStackTrace();
+    }
+
+    closeOutput();
+  }
+
+  public void testFailureNegativeOffset() throws IOException {
+    if (skip)
+      return;
+    try {
+      writer.append("keyX".getBytes(), -1, 4, "valueX".getBytes(), 0, 6);
+      Assert.fail("Error on handling negative offset.");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+    }
+    closeOutput();
+  }
+
+  public void testFailureNegativeOffset_2() throws IOException {
+    if (skip)
+      return;
+    closeOutput();
+
+    Reader reader = new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+    Scanner scanner = reader.createScanner();
+    try {
+      scanner.lowerBound("keyX".getBytes(), -1, 4);
+      Assert.fail("Error on handling negative offset.");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+    }
+    finally {
+      reader.close();
+      scanner.close();
+    }
+    closeOutput();
+  }
+
+  public void testFailureNegativeLength() throws IOException {
+    if (skip)
+      return;
+    try {
+      writer.append("keyX".getBytes(), 0, -1, "valueX".getBytes(), 0, 6);
+      Assert.fail("Error on handling negative length.");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+    }
+    closeOutput();
+  }
+
+  public void testFailureNegativeLength_2() throws IOException {
+    if (skip)
+      return;
+    closeOutput();
+
+    Reader reader = new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+    Scanner scanner = reader.createScanner();
+    try {
+      scanner.lowerBound("keyX".getBytes(), 0, -1);
+      Assert.fail("Error on handling negative length.");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+    }
+    finally {
+      scanner.close();
+      reader.close();
+    }
+    closeOutput();
+  }
+
+  public void testFailureNegativeLength_3() throws IOException {
+    if (skip)
+      return;
+    writeRecords(3);
+
+    Reader reader =
+        new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+    Scanner scanner = reader.createScanner();
+    try {
+      // test negative array offset
+      try {
+        scanner.seekTo("keyY".getBytes(), -1, 4);
+        Assert.fail("Failed to handle negative offset.");
+      } catch (Exception e) {
+        // noop, expecting exceptions
+      }
+
+      // test negative array length
+      try {
+        scanner.seekTo("keyY".getBytes(), 0, -2);
+        Assert.fail("Failed to handle negative key length.");
+      } catch (Exception e) {
+        // noop, expecting exceptions
+      }
+    } finally {
+      reader.close();
+      scanner.close();
+    }
+  }
+
+  public void testFailureCompressionNotWorking() throws IOException {
+    if (skip)
+      return;
+    long rawDataSize = writeRecords(10 * records1stBlock, false);
+    if (!compression.equalsIgnoreCase(Compression.Algorithm.NONE.getName())) {
+      Assert.assertTrue(out.getPos() &lt; rawDataSize);
+    }
+    closeOutput();
+  }
+
+  public void testFailureFileWriteNotAt0Position() throws IOException {
+    if (skip)
+      return;
+    closeOutput();
+    out = fs.create(path);
+    out.write(123);
+
+    try {
+      writer = new Writer(out, BLOCK_SIZE, compression, comparator, conf);
+      Assert.fail("Failed to catch file write not at position 0.");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+    }
+    closeOutput();
+  }
+
+  private long writeRecords(int count) throws IOException {
+    return writeRecords(count, true);
+  }
+
+  private long writeRecords(int count, boolean close) throws IOException {
+    long rawDataSize = writeRecords(writer, count);
+    if (close) {
+      closeOutput();
+    }
+    return rawDataSize;
+  }
+
+  static long writeRecords(Writer writer, int count) throws IOException {
+    long rawDataSize = 0;
+    int nx;
+    for (nx = 0; nx &lt; count; nx++) {
+      byte[] key = composeSortedKey(KEY, count, nx).getBytes();
+      byte[] value = (VALUE + nx).getBytes();
+      writer.append(key, value);
+      rawDataSize +=
+          WritableUtils.getVIntSize(key.length) + key.length
+              + WritableUtils.getVIntSize(value.length) + value.length;
+    }
+    return rawDataSize;
+  }
+
+  /**
+   * Insert some leading 0's in front of the value, to make the keys sorted.
+   * 
+   * @param prefix
+   * @param total
+   * @param value
+   * @return
+   */
+  static String composeSortedKey(String prefix, int total, int value) {
+    return String.format("%s%010d", prefix, value);
+  }
+
+  /**
+   * Calculate how many digits are in the 10-based integer.
+   * 
+   * @param value
+   * @return
+   */
+  private static int numberDigits(int value) {
+    int digits = 0;
+    while ((value = value / 10) &gt; 0) {
+      digits++;
+    }
+    return digits;
+  }
+
+  private void readRecords(int count) throws IOException {
+    readRecords(fs, path, count, conf);
+  }
+
+  static void readRecords(FileSystem fs, Path path, int count,
+      Configuration conf) throws IOException {
+    Reader reader =
+        new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+    Scanner scanner = reader.createScanner();
+
+    try {
+      for (int nx = 0; nx &lt; count; nx++, scanner.advance()) {
+        Assert.assertFalse(scanner.atEnd());
+        // Assert.assertTrue(scanner.next());
+
+        byte[] kbuf = new byte[BUF_SIZE];
+        int klen = scanner.entry().getKeyLength();
+        scanner.entry().getKey(kbuf);
+        Assert.assertEquals(new String(kbuf, 0, klen), composeSortedKey(KEY,
+            count, nx));
+
+        byte[] vbuf = new byte[BUF_SIZE];
+        int vlen = scanner.entry().getValueLength();
+        scanner.entry().getValue(vbuf);
+        Assert.assertEquals(new String(vbuf, 0, vlen), VALUE + nx);
+      }
+
+      Assert.assertTrue(scanner.atEnd());
+      Assert.assertFalse(scanner.advance());
+    }
+    finally {
+      scanner.close();
+      reader.close();
+    }
+  }
+
+  private void checkBlockIndex(int count, int recordIndex,
+      int blockIndexExpected) throws IOException {
+    Reader reader = new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+    Scanner scanner = reader.createScanner();
+    scanner.seekTo(composeSortedKey(KEY, count, recordIndex).getBytes());
+    Assert.assertEquals(blockIndexExpected, scanner.currentLocation
+        .getBlockIndex());
+    scanner.close();
+    reader.close();
+  }
+
+  private void readValueBeforeKey(int count, int recordIndex)
+      throws IOException {
+    Reader reader =
+        new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+    Scanner scanner =
+        reader.createScanner(composeSortedKey(KEY, count, recordIndex)
+            .getBytes(), null);
+
+    try {
+      byte[] vbuf = new byte[BUF_SIZE];
+      int vlen = scanner.entry().getValueLength();
+      scanner.entry().getValue(vbuf);
+      Assert.assertEquals(new String(vbuf, 0, vlen), VALUE + recordIndex);
+
+      byte[] kbuf = new byte[BUF_SIZE];
+      int klen = scanner.entry().getKeyLength();
+      scanner.entry().getKey(kbuf);
+      Assert.assertEquals(new String(kbuf, 0, klen), composeSortedKey(KEY,
+          count, recordIndex));
+    }
+    finally {
+      scanner.close();
+      reader.close();
+    }
+  }
+
+  private void readKeyWithoutValue(int count, int recordIndex)
+      throws IOException {
+    Reader reader = new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+    Scanner scanner =
+        reader.createScanner(composeSortedKey(KEY, count, recordIndex)
+            .getBytes(), null);
+
+    try {
+      // read the indexed key
+      byte[] kbuf1 = new byte[BUF_SIZE];
+      int klen1 = scanner.entry().getKeyLength();
+      scanner.entry().getKey(kbuf1);
+      Assert.assertEquals(new String(kbuf1, 0, klen1), composeSortedKey(KEY,
+          count, recordIndex));
+
+      if (scanner.advance() &amp;&amp; !scanner.atEnd()) {
+        // read the next key following the indexed
+        byte[] kbuf2 = new byte[BUF_SIZE];
+        int klen2 = scanner.entry().getKeyLength();
+        scanner.entry().getKey(kbuf2);
+        Assert.assertEquals(new String(kbuf2, 0, klen2), composeSortedKey(KEY,
+            count, recordIndex + 1));
+      }
+    }
+    finally {
+      scanner.close();
+      reader.close();
+    }
+  }
+
+  private void readValueWithoutKey(int count, int recordIndex)
+      throws IOException {
+    Reader reader = new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+
+    Scanner scanner =
+        reader.createScanner(composeSortedKey(KEY, count, recordIndex)
+            .getBytes(), null);
+
+    byte[] vbuf1 = new byte[BUF_SIZE];
+    int vlen1 = scanner.entry().getValueLength();
+    scanner.entry().getValue(vbuf1);
+    Assert.assertEquals(new String(vbuf1, 0, vlen1), VALUE + recordIndex);
+
+    if (scanner.advance() &amp;&amp; !scanner.atEnd()) {
+      byte[] vbuf2 = new byte[BUF_SIZE];
+      int vlen2 = scanner.entry().getValueLength();
+      scanner.entry().getValue(vbuf2);
+      Assert.assertEquals(new String(vbuf2, 0, vlen2), VALUE
+          + (recordIndex + 1));
+    }
+
+    scanner.close();
+    reader.close();
+  }
+
+  private void readKeyManyTimes(int count, int recordIndex) throws IOException {
+    Reader reader = new Reader(fs.open(path), fs.getFileStatus(path).getLen(), conf);
+
+    Scanner scanner =
+        reader.createScanner(composeSortedKey(KEY, count, recordIndex)
+            .getBytes(), null);
+
+    // read the indexed key
+    byte[] kbuf1 = new byte[BUF_SIZE];
+    int klen1 = scanner.entry().getKeyLength();
+    scanner.entry().getKey(kbuf1);
+    Assert.assertEquals(new String(kbuf1, 0, klen1), composeSortedKey(KEY,
+        count, recordIndex));
+
+    klen1 = scanner.entry().getKeyLength();
+    scanner.entry().getKey(kbuf1);
+    Assert.assertEquals(new String(kbuf1, 0, klen1), composeSortedKey(KEY,
+        count, recordIndex));
+
+    klen1 = scanner.entry().getKeyLength();
+    scanner.entry().getKey(kbuf1);
+    Assert.assertEquals(new String(kbuf1, 0, klen1), composeSortedKey(KEY,
+        count, recordIndex));
+
+    scanner.close();
+    reader.close();
+  }
+
+  private void closeOutput() throws IOException {
+    if (writer != null) {
+      writer.close();
+      writer = null;
+    }
+    if (out != null) {
+      out.close();
+      out = null;
+    }
+  }
+}

Added: hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileComparators.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileComparators.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileComparators.java (added)
+++ hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileComparators.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,122 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.IOException;
+
+import junit.framework.Assert;
+import junit.framework.TestCase;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.file.tfile.TFile.Writer;
+
+/**
+ * 
+ * Byte arrays test case class using GZ compression codec, base class of none
+ * and LZO compression classes.
+ * 
+ */
+public class TestTFileComparators extends TestCase {
+  private static String ROOT =
+      System.getProperty("test.build.data", "/tmp/tfile-test");
+
+  private final static int BLOCK_SIZE = 512;
+  private FileSystem fs;
+  private Configuration conf;
+  private Path path;
+  private FSDataOutputStream out;
+  private Writer writer;
+
+  private String compression = Compression.Algorithm.GZ.getName();
+  private String outputFile = "TFileTestComparators";
+  /*
+   * pre-sampled numbers of records in one block, based on the given the
+   * generated key and value strings
+   */
+  // private int records1stBlock = 4314;
+  // private int records2ndBlock = 4108;
+  private int records1stBlock = 4480;
+  private int records2ndBlock = 4263;
+
+  @Override
+  public void setUp() throws IOException {
+    conf = new Configuration();
+    path = new Path(ROOT, outputFile);
+    fs = path.getFileSystem(conf);
+    out = fs.create(path);
+  }
+
+  @Override
+  public void tearDown() throws IOException {
+    fs.delete(path, true);
+  }
+
+  // bad comparator format
+  public void testFailureBadComparatorNames() throws IOException {
+    try {
+      writer = new Writer(out, BLOCK_SIZE, compression, "badcmp", conf);
+      Assert.fail("Failed to catch unsupported comparator names");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+      e.printStackTrace();
+    }
+  }
+
+  // jclass that doesn't exist
+  public void testFailureBadJClassNames() throws IOException {
+    try {
+      writer =
+          new Writer(out, BLOCK_SIZE, compression,
+              "jclass: some.non.existence.clazz", conf);
+      Assert.fail("Failed to catch unsupported comparator names");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+      e.printStackTrace();
+    }
+  }
+
+  // class exists but not a RawComparator
+  public void testFailureBadJClasses() throws IOException {
+    try {
+      writer =
+          new Writer(out, BLOCK_SIZE, compression,
+              "jclass:org.apache.hadoop.io.file.tfile.Chunk", conf);
+      Assert.fail("Failed to catch unsupported comparator names");
+    }
+    catch (Exception e) {
+      // noop, expecting exceptions
+      e.printStackTrace();
+    }
+  }
+
+  private void closeOutput() throws IOException {
+    if (writer != null) {
+      writer.close();
+      writer = null;
+    }
+    if (out != null) {
+      out.close();
+      out = null;
+    }
+  }
+}

Added: hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileJClassComparatorByteArrays.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileJClassComparatorByteArrays.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileJClassComparatorByteArrays.java (added)
+++ hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileJClassComparatorByteArrays.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,58 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.IOException;
+
+import org.apache.hadoop.io.RawComparator;
+import org.apache.hadoop.io.WritableComparator;
+
+/**
+ * 
+ * Byte arrays test case class using GZ compression codec, base class of none
+ * and LZO compression classes.
+ * 
+ */
+
+public class TestTFileJClassComparatorByteArrays extends TestTFileByteArrays {
+  /**
+   * Test non-compression codec, using the same test cases as in the ByteArrays.
+   */
+  @Override
+  public void setUp() throws IOException {
+    init(Compression.Algorithm.GZ.getName(),
+        "jclass: org.apache.hadoop.io.file.tfile.MyComparator",
+        "TFileTestJClassComparator", 4480, 4263);
+    super.setUp();
+  }
+}
+
+class MyComparator implements RawComparator&lt;byte[]&gt; {
+
+  @Override
+  public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
+    return WritableComparator.compareBytes(b1, s1, l1, b2, s2, l2);
+  }
+
+  @Override
+  public int compare(byte[] o1, byte[] o2) {
+    return WritableComparator.compareBytes(o1, 0, o1.length, o2, 0, o2.length);
+  }
+  
+}
+

Added: hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileLzoCodecsByteArrays.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileLzoCodecsByteArrays.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileLzoCodecsByteArrays.java (added)
+++ hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileLzoCodecsByteArrays.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,42 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.IOException;
+
+import org.apache.hadoop.io.file.tfile.Compression.Algorithm;
+
+public class TestTFileLzoCodecsByteArrays extends TestTFileByteArrays {
+  /**
+   * Test LZO compression codec, using the same test cases as in the ByteArrays.
+   */
+  @Override
+  public void setUp() throws IOException {
+    skip = !(Algorithm.LZO.isSupported());
+    if (skip) {
+      System.out.println("Skipped");
+    }
+
+    // TODO: sample the generated key/value records, and put the numbers below
+    init(Compression.Algorithm.LZO.getName(), "memcmp", "TFileTestCodecsLzo",
+        2605, 2558);
+    if (!skip)
+      super.setUp();
+  }
+}

Added: hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileLzoCodecsStreams.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileLzoCodecsStreams.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileLzoCodecsStreams.java (added)
+++ hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileLzoCodecsStreams.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,39 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.IOException;
+
+import org.apache.hadoop.io.file.tfile.Compression.Algorithm;
+
+public class TestTFileLzoCodecsStreams extends TestTFileStreams {
+  /**
+   * Test LZO compression codec, using the same test cases as in the ByteArrays.
+   */
+  @Override
+  public void setUp() throws IOException {
+    skip = !(Algorithm.LZO.isSupported());
+    if (skip) {
+      System.out.println("Skipped");
+    }
+    init(Compression.Algorithm.LZO.getName(), "memcmp", "TFileTestCodecsLzo");
+    if (!skip) 
+      super.setUp();
+  }
+}

Added: hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileNoneCodecsByteArrays.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileNoneCodecsByteArrays.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileNoneCodecsByteArrays.java (added)
+++ hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileNoneCodecsByteArrays.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,32 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.IOException;
+
+public class TestTFileNoneCodecsByteArrays extends TestTFileByteArrays {
+  /**
+   * Test non-compression codec, using the same test cases as in the ByteArrays.
+   */
+  @Override
+  public void setUp() throws IOException {
+    init(Compression.Algorithm.NONE.getName(), "memcmp", "TFileTestCodecsNone",
+        24, 24);
+    super.setUp();
+  }
+}




</pre>
</div>
</content>
</entry>
<entry>
<title>svn commit: r787913 [1/4] - in /hadoop/common/trunk: ./ src/java/org/apache/hadoop/io/file/ src/java/org/apache/hadoop/io/file/tfile/ src/test/ src/test/core/org/apache/hadoop/io/file/ src/test/core/org/apache/hadoop/io/file/tfile/</title>
<author><name>cdouglas@apache.org</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624054828.7C5722388872@eris.apache.org%3e"/>
<id>urn:uuid:%3c20090624054828-7C5722388872@eris-apache-org%3e</id>
<updated>2009-06-24T05:48:26Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Author: cdouglas
Date: Wed Jun 24 05:48:25 2009
New Revision: 787913

URL: http://svn.apache.org/viewvc?rev=787913&amp;view=rev
Log:
HADOOP-3315. Add a new, binary file foramt, TFile. Contributed by Hong Tang.

Added:
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BCFile.java
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BoundedByteArrayOutputStream.java
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BoundedRangeFileInputStream.java
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/ByteArray.java
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Chunk.java
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/CompareUtils.java
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Compression.java
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/MetaBlockAlreadyExists.java
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/MetaBlockDoesNotExist.java
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/RawComparable.java
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/SimpleBufferedOutputStream.java
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/TFile.java
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/TFileDumper.java
    hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Utils.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/KVGenerator.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/KeySampler.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/NanoTimer.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/RandomDistribution.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFile.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileByteArrays.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileComparators.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileJClassComparatorByteArrays.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileLzoCodecsByteArrays.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileLzoCodecsStreams.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileNoneCodecsByteArrays.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileNoneCodecsJClassComparatorByteArrays.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileNoneCodecsStreams.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileSeek.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileSeqFileComparison.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileSplit.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileStreams.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestTFileUnsortedByteArrays.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/TestVLong.java
    hadoop/common/trunk/src/test/core/org/apache/hadoop/io/file/tfile/Timer.java
Modified:
    hadoop/common/trunk/CHANGES.txt
    hadoop/common/trunk/build.xml
    hadoop/common/trunk/src/test/findbugsExcludeFile.xml

Modified: hadoop/common/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/CHANGES.txt?rev=787913&amp;r1=787912&amp;r2=787913&amp;view=diff
==============================================================================
--- hadoop/common/trunk/CHANGES.txt (original)
+++ hadoop/common/trunk/CHANGES.txt Wed Jun 24 05:48:25 2009
@@ -155,6 +155,8 @@
     HADOOP-5897. Add name-node metrics to capture java heap usage.
     (Suresh Srinivas via shv)
 
+    HADOOP-3315. Add a new, binary file foramt, TFile. (Hong Tang via cdouglas)
+
   IMPROVEMENTS
 
     HADOOP-4565. Added CombineFileInputFormat to use data locality information

Modified: hadoop/common/trunk/build.xml
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/build.xml?rev=787913&amp;r1=787912&amp;r2=787913&amp;view=diff
==============================================================================
--- hadoop/common/trunk/build.xml (original)
+++ hadoop/common/trunk/build.xml Wed Jun 24 05:48:25 2009
@@ -507,6 +507,8 @@
       &lt;sysproperty key="java.library.path"
        value="${build.native}/lib:${lib.dir}/native/${build.platform}"/&gt;
       &lt;sysproperty key="install.c++.examples" value="${install.c++.examples}"/&gt;
+      &lt;sysproperty key="io.compression.codec.lzo.class"
+		  value="${io.compression.codec.lzo.class}"/&gt;
       &lt;!-- set compile.c++ in the child jvm only if it is set --&gt;
       &lt;syspropertyset dynamic="no"&gt;
          &lt;propertyref name="compile.c++"/&gt;

Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BCFile.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BCFile.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BCFile.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BCFile.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,979 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.Closeable;
+import java.io.DataInput;
+import java.io.DataInputStream;
+import java.io.DataOutput;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Map;
+import java.util.TreeMap;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.Decompressor;
+import org.apache.hadoop.io.file.tfile.CompareUtils.Scalar;
+import org.apache.hadoop.io.file.tfile.CompareUtils.ScalarComparator;
+import org.apache.hadoop.io.file.tfile.CompareUtils.ScalarLong;
+import org.apache.hadoop.io.file.tfile.Compression.Algorithm;
+import org.apache.hadoop.io.file.tfile.Utils.Version;
+
+/**
+ * Block Compressed file, the underlying physical storage layer for TFile.
+ * BCFile provides the basic block level compression for the data block and meta
+ * blocks. It is separated from TFile as it may be used for other
+ * block-compressed file implementation.
+ */
+final class BCFile {
+  // the current version of BCFile impl, increment them (major or minor) made
+  // enough changes
+  static final Version API_VERSION = new Version((short) 1, (short) 0);
+  static final Log LOG = LogFactory.getLog(BCFile.class);
+
+  /**
+   * Prevent the instantiation of BCFile objects.
+   */
+  private BCFile() {
+    // nothing
+  }
+
+  /**
+   * BCFile writer, the entry point for creating a new BCFile.
+   */
+  static public class Writer implements Closeable {
+    private final FSDataOutputStream out;
+    private final Configuration conf;
+    // the single meta block containing index of compressed data blocks
+    final DataIndex dataIndex;
+    // index for meta blocks
+    final MetaIndex metaIndex;
+    boolean blkInProgress = false;
+    private boolean metaBlkSeen = false;
+    private boolean closed = false;
+    long errorCount = 0;
+    // reusable buffers.
+    private BytesWritable fsOutputBuffer;
+
+    /**
+     * Call-back interface to register a block after a block is closed.
+     */
+    private static interface BlockRegister {
+      /**
+       * Register a block that is fully closed.
+       * 
+       * @param raw
+       *          The size of block in terms of uncompressed bytes.
+       * @param offsetStart
+       *          The start offset of the block.
+       * @param offsetEnd
+       *          One byte after the end of the block. Compressed block size is
+       *          offsetEnd - offsetStart.
+       */
+      public void register(long raw, long offsetStart, long offsetEnd);
+    }
+
+    /**
+     * Intermediate class that maintain the state of a Writable Compression
+     * Block.
+     */
+    private static final class WBlockState {
+      private final Algorithm compressAlgo;
+      private Compressor compressor; // !null only if using native
+      // Hadoop compression
+      private final FSDataOutputStream fsOut;
+      private final long posStart;
+      private final SimpleBufferedOutputStream fsBufferedOutput;
+      private OutputStream out;
+
+      /**
+       * @param compressionAlgo
+       *          The compression algorithm to be used to for compression.
+       * @throws IOException
+       */
+      public WBlockState(Algorithm compressionAlgo, FSDataOutputStream fsOut,
+          BytesWritable fsOutputBuffer, Configuration conf) throws IOException {
+        this.compressAlgo = compressionAlgo;
+        this.fsOut = fsOut;
+        this.posStart = fsOut.getPos();
+
+        fsOutputBuffer.setCapacity(TFile.getFSOutputBufferSize(conf));
+
+        this.fsBufferedOutput =
+            new SimpleBufferedOutputStream(this.fsOut, fsOutputBuffer.get());
+        this.compressor = compressAlgo.getCompressor();
+
+        try {
+          this.out =
+              compressionAlgo.createCompressionStream(fsBufferedOutput,
+                  compressor, 0);
+        } catch (IOException e) {
+          compressAlgo.returnCompressor(compressor);
+          throw e;
+        }
+      }
+
+      /**
+       * Get the output stream for BlockAppender's consumption.
+       * 
+       * @return the output stream suitable for writing block data.
+       */
+      OutputStream getOutputStream() {
+        return out;
+      }
+
+      /**
+       * Get the current position in file.
+       * 
+       * @return The current byte offset in underlying file.
+       * @throws IOException
+       */
+      long getCurrentPos() throws IOException {
+        return fsOut.getPos() + fsBufferedOutput.size();
+      }
+
+      long getStartPos() {
+        return posStart;
+      }
+
+      /**
+       * Current size of compressed data.
+       * 
+       * @return
+       * @throws IOException
+       */
+      long getCompressedSize() throws IOException {
+        long ret = getCurrentPos() - posStart;
+        return ret;
+      }
+
+      /**
+       * Finishing up the current block.
+       */
+      public void finish() throws IOException {
+        try {
+          if (out != null) {
+            out.flush();
+            out = null;
+          }
+        } finally {
+          compressAlgo.returnCompressor(compressor);
+          compressor = null;
+        }
+      }
+    }
+
+    /**
+     * Access point to stuff data into a block.
+     * 
+     * TODO: Change DataOutputStream to something else that tracks the size as
+     * long instead of int. Currently, we will wrap around if the row block size
+     * is greater than 4GB.
+     */
+    public class BlockAppender extends DataOutputStream {
+      private final BlockRegister blockRegister;
+      private final WBlockState wBlkState;
+      @SuppressWarnings("hiding")
+      private boolean closed = false;
+
+      /**
+       * Constructor
+       * 
+       * @param register
+       *          the block register, which is called when the block is closed.
+       * @param wbs
+       *          The writable compression block state.
+       */
+      BlockAppender(BlockRegister register, WBlockState wbs) {
+        super(wbs.getOutputStream());
+        this.blockRegister = register;
+        this.wBlkState = wbs;
+      }
+
+      /**
+       * Get the raw size of the block.
+       * 
+       * @return the number of uncompressed bytes written through the
+       *         BlockAppender so far.
+       * @throws IOException
+       */
+      public long getRawSize() throws IOException {
+        /**
+         * Expecting the size() of a block not exceeding 4GB. Assuming the
+         * size() will wrap to negative integer if it exceeds 2GB.
+         */
+        return size() &amp; 0x00000000ffffffffL;
+      }
+
+      /**
+       * Get the compressed size of the block in progress.
+       * 
+       * @return the number of compressed bytes written to the underlying FS
+       *         file. The size may be smaller than actual need to compress the
+       *         all data written due to internal buffering inside the
+       *         compressor.
+       * @throws IOException
+       */
+      public long getCompressedSize() throws IOException {
+        return wBlkState.getCompressedSize();
+      }
+
+      @Override
+      public void flush() {
+        // The down stream is a special kind of stream that finishes a
+        // compression block upon flush. So we disable flush() here.
+      }
+
+      /**
+       * Signaling the end of write to the block. The block register will be
+       * called for registering the finished block.
+       */
+      @Override
+      public void close() throws IOException {
+        if (closed == true) {
+          return;
+        }
+        try {
+          ++errorCount;
+          wBlkState.finish();
+          blockRegister.register(getRawSize(), wBlkState.getStartPos(),
+              wBlkState.getCurrentPos());
+          --errorCount;
+        } finally {
+          closed = true;
+          blkInProgress = false;
+        }
+      }
+    }
+
+    /**
+     * Constructor
+     * 
+     * @param fout
+     *          FS output stream.
+     * @param compressionName
+     *          Name of the compression algorithm, which will be used for all
+     *          data blocks.
+     * @throws IOException
+     * @see Compression#getSupportedAlgorithms
+     */
+    public Writer(FSDataOutputStream fout, String compressionName,
+        Configuration conf) throws IOException {
+      if (fout.getPos() != 0) {
+        throw new IOException("Output file not at zero offset.");
+      }
+
+      this.out = fout;
+      this.conf = conf;
+      dataIndex = new DataIndex(compressionName);
+      metaIndex = new MetaIndex();
+      fsOutputBuffer = new BytesWritable();
+      Magic.write(fout);
+    }
+
+    /**
+     * Close the BCFile Writer. Attempting to use the Writer after calling
+     * &lt;code&gt;close&lt;/code&gt; is not allowed and may lead to undetermined results.
+     */
+    public void close() throws IOException {
+      if (closed == true) {
+        return;
+      }
+
+      try {
+        if (errorCount == 0) {
+          if (blkInProgress == true) {
+            throw new IllegalStateException(
+                "Close() called with active block appender.");
+          }
+
+          // add metaBCFileIndex to metaIndex as the last meta block
+          BlockAppender appender =
+              prepareMetaBlock(DataIndex.BLOCK_NAME,
+                  getDefaultCompressionAlgorithm());
+          try {
+            dataIndex.write(appender);
+          } finally {
+            appender.close();
+          }
+
+          long offsetIndexMeta = out.getPos();
+          metaIndex.write(out);
+
+          // Meta Index and the trailing section are written out directly.
+          out.writeLong(offsetIndexMeta);
+
+          API_VERSION.write(out);
+          Magic.write(out);
+          out.flush();
+        }
+      } finally {
+        closed = true;
+      }
+    }
+
+    private Algorithm getDefaultCompressionAlgorithm() {
+      return dataIndex.getDefaultCompressionAlgorithm();
+    }
+
+    private BlockAppender prepareMetaBlock(String name, Algorithm compressAlgo)
+        throws IOException, MetaBlockAlreadyExists {
+      if (blkInProgress == true) {
+        throw new IllegalStateException(
+            "Cannot create Meta Block until previous block is closed.");
+      }
+
+      if (metaIndex.getMetaByName(name) != null) {
+        throw new MetaBlockAlreadyExists("name=" + name);
+      }
+
+      MetaBlockRegister mbr = new MetaBlockRegister(name, compressAlgo);
+      WBlockState wbs =
+          new WBlockState(compressAlgo, out, fsOutputBuffer, conf);
+      BlockAppender ba = new BlockAppender(mbr, wbs);
+      blkInProgress = true;
+      metaBlkSeen = true;
+      return ba;
+    }
+
+    /**
+     * Create a Meta Block and obtain an output stream for adding data into the
+     * block. There can only be one BlockAppender stream active at any time.
+     * Regular Blocks may not be created after the first Meta Blocks. The caller
+     * must call BlockAppender.close() to conclude the block creation.
+     * 
+     * @param name
+     *          The name of the Meta Block. The name must not conflict with
+     *          existing Meta Blocks.
+     * @param compressionName
+     *          The name of the compression algorithm to be used.
+     * @return The BlockAppender stream
+     * @throws IOException
+     * @throws MetaBlockAlreadyExists
+     *           If the meta block with the name already exists.
+     */
+    public BlockAppender prepareMetaBlock(String name, String compressionName)
+        throws IOException, MetaBlockAlreadyExists {
+      return prepareMetaBlock(name, Compression
+          .getCompressionAlgorithmByName(compressionName));
+    }
+
+    /**
+     * Create a Meta Block and obtain an output stream for adding data into the
+     * block. The Meta Block will be compressed with the same compression
+     * algorithm as data blocks. There can only be one BlockAppender stream
+     * active at any time. Regular Blocks may not be created after the first
+     * Meta Blocks. The caller must call BlockAppender.close() to conclude the
+     * block creation.
+     * 
+     * @param name
+     *          The name of the Meta Block. The name must not conflict with
+     *          existing Meta Blocks.
+     * @return The BlockAppender stream
+     * @throws MetaBlockAlreadyExists
+     *           If the meta block with the name already exists.
+     * @throws IOException
+     */
+    public BlockAppender prepareMetaBlock(String name) throws IOException,
+        MetaBlockAlreadyExists {
+      return prepareMetaBlock(name, getDefaultCompressionAlgorithm());
+    }
+
+    /**
+     * Create a Data Block and obtain an output stream for adding data into the
+     * block. There can only be one BlockAppender stream active at any time.
+     * Data Blocks may not be created after the first Meta Blocks. The caller
+     * must call BlockAppender.close() to conclude the block creation.
+     * 
+     * @return The BlockAppender stream
+     * @throws IOException
+     */
+    public BlockAppender prepareDataBlock() throws IOException {
+      if (blkInProgress == true) {
+        throw new IllegalStateException(
+            "Cannot create Data Block until previous block is closed.");
+      }
+
+      if (metaBlkSeen == true) {
+        throw new IllegalStateException(
+            "Cannot create Data Block after Meta Blocks.");
+      }
+
+      DataBlockRegister dbr = new DataBlockRegister();
+
+      WBlockState wbs =
+          new WBlockState(getDefaultCompressionAlgorithm(), out,
+              fsOutputBuffer, conf);
+      BlockAppender ba = new BlockAppender(dbr, wbs);
+      blkInProgress = true;
+      return ba;
+    }
+
+    /**
+     * Callback to make sure a meta block is added to the internal list when its
+     * stream is closed.
+     */
+    private class MetaBlockRegister implements BlockRegister {
+      private final String name;
+      private final Algorithm compressAlgo;
+
+      MetaBlockRegister(String name, Algorithm compressAlgo) {
+        this.name = name;
+        this.compressAlgo = compressAlgo;
+      }
+
+      public void register(long raw, long begin, long end) {
+        metaIndex.addEntry(new MetaIndexEntry(name, compressAlgo,
+            new BlockRegion(begin, end - begin, raw)));
+      }
+    }
+
+    /**
+     * Callback to make sure a data block is added to the internal list when
+     * it's being closed.
+     * 
+     */
+    private class DataBlockRegister implements BlockRegister {
+      DataBlockRegister() {
+        // do nothing
+      }
+
+      public void register(long raw, long begin, long end) {
+        dataIndex.addBlockRegion(new BlockRegion(begin, end - begin, raw));
+      }
+    }
+  }
+
+  /**
+   * BCFile Reader, interface to read the file's data and meta blocks.
+   */
+  static public class Reader implements Closeable {
+    private final FSDataInputStream in;
+    private final Configuration conf;
+    final DataIndex dataIndex;
+    // Index for meta blocks
+    final MetaIndex metaIndex;
+    final Version version;
+
+    /**
+     * Intermediate class that maintain the state of a Readable Compression
+     * Block.
+     */
+    static private final class RBlockState {
+      private final Algorithm compressAlgo;
+      private Decompressor decompressor;
+      private final BlockRegion region;
+      private final InputStream in;
+
+      public RBlockState(Algorithm compressionAlgo, FSDataInputStream fsin,
+          BlockRegion region, Configuration conf) throws IOException {
+        this.compressAlgo = compressionAlgo;
+        this.region = region;
+        this.decompressor = compressionAlgo.getDecompressor();
+
+        try {
+          this.in =
+              compressAlgo
+                  .createDecompressionStream(new BoundedRangeFileInputStream(
+                      fsin, this.region.getOffset(), this.region
+                          .getCompressedSize()), decompressor, TFile
+                      .getFSInputBufferSize(conf));
+        } catch (IOException e) {
+          compressAlgo.returnDecompressor(decompressor);
+          throw e;
+        }
+      }
+
+      /**
+       * Get the output stream for BlockAppender's consumption.
+       * 
+       * @return the output stream suitable for writing block data.
+       */
+      public InputStream getInputStream() {
+        return in;
+      }
+
+      public String getCompressionName() {
+        return compressAlgo.getName();
+      }
+
+      public BlockRegion getBlockRegion() {
+        return region;
+      }
+
+      public void finish() throws IOException {
+        try {
+          in.close();
+        } finally {
+          compressAlgo.returnDecompressor(decompressor);
+          decompressor = null;
+        }
+      }
+    }
+
+    /**
+     * Access point to read a block.
+     */
+    public static class BlockReader extends DataInputStream {
+      private final RBlockState rBlkState;
+      private boolean closed = false;
+
+      BlockReader(RBlockState rbs) {
+        super(rbs.getInputStream());
+        rBlkState = rbs;
+      }
+
+      /**
+       * Finishing reading the block. Release all resources.
+       */
+      @Override
+      public void close() throws IOException {
+        if (closed == true) {
+          return;
+        }
+        try {
+          // Do not set rBlkState to null. People may access stats after calling
+          // close().
+          rBlkState.finish();
+        } finally {
+          closed = true;
+        }
+      }
+
+      /**
+       * Get the name of the compression algorithm used to compress the block.
+       * 
+       * @return name of the compression algorithm.
+       */
+      public String getCompressionName() {
+        return rBlkState.getCompressionName();
+      }
+
+      /**
+       * Get the uncompressed size of the block.
+       * 
+       * @return uncompressed size of the block.
+       */
+      public long getRawSize() {
+        return rBlkState.getBlockRegion().getRawSize();
+      }
+
+      /**
+       * Get the compressed size of the block.
+       * 
+       * @return compressed size of the block.
+       */
+      public long getCompressedSize() {
+        return rBlkState.getBlockRegion().getCompressedSize();
+      }
+
+      /**
+       * Get the starting position of the block in the file.
+       * 
+       * @return the starting position of the block in the file.
+       */
+      public long getStartPos() {
+        return rBlkState.getBlockRegion().getOffset();
+      }
+    }
+
+    /**
+     * Constructor
+     * 
+     * @param fin
+     *          FS input stream.
+     * @param fileLength
+     *          Length of the corresponding file
+     * @throws IOException
+     */
+    public Reader(FSDataInputStream fin, long fileLength, Configuration conf)
+        throws IOException {
+      this.in = fin;
+      this.conf = conf;
+
+      // move the cursor to the beginning of the tail, containing: offset to the
+      // meta block index, version and magic
+      fin.seek(fileLength - Magic.size() - Version.size() - Long.SIZE
+          / Byte.SIZE);
+      long offsetIndexMeta = fin.readLong();
+      version = new Version(fin);
+      Magic.readAndVerify(fin);
+
+      if (!version.compatibleWith(BCFile.API_VERSION)) {
+        throw new RuntimeException("Incompatible BCFile fileBCFileVersion.");
+      }
+
+      // read meta index
+      fin.seek(offsetIndexMeta);
+      metaIndex = new MetaIndex(fin);
+
+      // read data:BCFile.index, the data block index
+      BlockReader blockR = getMetaBlock(DataIndex.BLOCK_NAME);
+      try {
+        dataIndex = new DataIndex(blockR);
+      } finally {
+        blockR.close();
+      }
+    }
+
+    /**
+     * Get the name of the default compression algorithm.
+     * 
+     * @return the name of the default compression algorithm.
+     */
+    public String getDefaultCompressionName() {
+      return dataIndex.getDefaultCompressionAlgorithm().getName();
+    }
+
+    /**
+     * Get version of BCFile file being read.
+     * 
+     * @return version of BCFile file being read.
+     */
+    public Version getBCFileVersion() {
+      return version;
+    }
+
+    /**
+     * Get version of BCFile API.
+     * 
+     * @return version of BCFile API.
+     */
+    public Version getAPIVersion() {
+      return API_VERSION;
+    }
+
+    /**
+     * Finishing reading the BCFile. Release all resources.
+     */
+    public void close() {
+      // nothing to be done now
+    }
+
+    /**
+     * Get the number of data blocks.
+     * 
+     * @return the number of data blocks.
+     */
+    public int getBlockCount() {
+      return dataIndex.getBlockRegionList().size();
+    }
+
+    /**
+     * Stream access to a Meta Block.
+     * 
+     * @param name
+     *          meta block name
+     * @return BlockReader input stream for reading the meta block.
+     * @throws IOException
+     * @throws MetaBlockDoesNotExist
+     *           The Meta Block with the given name does not exist.
+     */
+    public BlockReader getMetaBlock(String name) throws IOException,
+        MetaBlockDoesNotExist {
+      MetaIndexEntry imeBCIndex = metaIndex.getMetaByName(name);
+      if (imeBCIndex == null) {
+        throw new MetaBlockDoesNotExist("name=" + name);
+      }
+
+      BlockRegion region = imeBCIndex.getRegion();
+      return createReader(imeBCIndex.getCompressionAlgorithm(), region);
+    }
+
+    /**
+     * Stream access to a Data Block.
+     * 
+     * @param blockIndex
+     *          0-based data block index.
+     * @return BlockReader input stream for reading the data block.
+     * @throws IOException
+     */
+    public BlockReader getDataBlock(int blockIndex) throws IOException {
+      if (blockIndex &lt; 0 || blockIndex &gt;= getBlockCount()) {
+        throw new IndexOutOfBoundsException(String.format(
+            "blockIndex=%d, numBlocks=%d", blockIndex, getBlockCount()));
+      }
+
+      BlockRegion region = dataIndex.getBlockRegionList().get(blockIndex);
+      return createReader(dataIndex.getDefaultCompressionAlgorithm(), region);
+    }
+
+    private BlockReader createReader(Algorithm compressAlgo, BlockRegion region)
+        throws IOException {
+      RBlockState rbs = new RBlockState(compressAlgo, in, region, conf);
+      return new BlockReader(rbs);
+    }
+
+    /**
+     * Find the smallest Block index whose starting offset is greater than or
+     * equal to the specified offset.
+     * 
+     * @param offset
+     *          User-specific offset.
+     * @return the index to the data Block if such block exists; or -1
+     *         otherwise.
+     */
+    public int getBlockIndexNear(long offset) {
+      ArrayList&lt;BlockRegion&gt; list = dataIndex.getBlockRegionList();
+      int idx =
+          Utils
+              .lowerBound(list, new ScalarLong(offset), new ScalarComparator());
+
+      if (idx == list.size()) {
+        return -1;
+      }
+
+      return idx;
+    }
+  }
+
+  /**
+   * Index for all Meta blocks.
+   */
+  static class MetaIndex {
+    // use a tree map, for getting a meta block entry by name
+    final Map&lt;String, MetaIndexEntry&gt; index;
+
+    // for write
+    public MetaIndex() {
+      index = new TreeMap&lt;String, MetaIndexEntry&gt;();
+    }
+
+    // for read, construct the map from the file
+    public MetaIndex(DataInput in) throws IOException {
+      int count = Utils.readVInt(in);
+      index = new TreeMap&lt;String, MetaIndexEntry&gt;();
+
+      for (int nx = 0; nx &lt; count; nx++) {
+        MetaIndexEntry indexEntry = new MetaIndexEntry(in);
+        index.put(indexEntry.getMetaName(), indexEntry);
+      }
+    }
+
+    public void addEntry(MetaIndexEntry indexEntry) {
+      index.put(indexEntry.getMetaName(), indexEntry);
+    }
+
+    public MetaIndexEntry getMetaByName(String name) {
+      return index.get(name);
+    }
+
+    public void write(DataOutput out) throws IOException {
+      Utils.writeVInt(out, index.size());
+
+      for (MetaIndexEntry indexEntry : index.values()) {
+        indexEntry.write(out);
+      }
+    }
+  }
+
+  /**
+   * An entry describes a meta block in the MetaIndex.
+   */
+  static final class MetaIndexEntry {
+    private final String metaName;
+    private final Algorithm compressionAlgorithm;
+    private final static String defaultPrefix = "data:";
+
+    private final BlockRegion region;
+
+    public MetaIndexEntry(DataInput in) throws IOException {
+      String fullMetaName = Utils.readString(in);
+      if (fullMetaName.startsWith(defaultPrefix)) {
+        metaName =
+            fullMetaName.substring(defaultPrefix.length(), fullMetaName
+                .length());
+      } else {
+        throw new IOException("Corrupted Meta region Index");
+      }
+
+      compressionAlgorithm =
+          Compression.getCompressionAlgorithmByName(Utils.readString(in));
+      region = new BlockRegion(in);
+    }
+
+    public MetaIndexEntry(String metaName, Algorithm compressionAlgorithm,
+        BlockRegion region) {
+      this.metaName = metaName;
+      this.compressionAlgorithm = compressionAlgorithm;
+      this.region = region;
+    }
+
+    public String getMetaName() {
+      return metaName;
+    }
+
+    public Algorithm getCompressionAlgorithm() {
+      return compressionAlgorithm;
+    }
+
+    public BlockRegion getRegion() {
+      return region;
+    }
+
+    public void write(DataOutput out) throws IOException {
+      Utils.writeString(out, defaultPrefix + metaName);
+      Utils.writeString(out, compressionAlgorithm.getName());
+
+      region.write(out);
+    }
+  }
+
+  /**
+   * Index of all compressed data blocks.
+   */
+  static class DataIndex {
+    final static String BLOCK_NAME = "BCFile.index";
+
+    private final Algorithm defaultCompressionAlgorithm;
+
+    // for data blocks, each entry specifies a block's offset, compressed size
+    // and raw size
+    private final ArrayList&lt;BlockRegion&gt; listRegions;
+
+    // for read, deserialized from a file
+    public DataIndex(DataInput in) throws IOException {
+      defaultCompressionAlgorithm =
+          Compression.getCompressionAlgorithmByName(Utils.readString(in));
+
+      int n = Utils.readVInt(in);
+      listRegions = new ArrayList&lt;BlockRegion&gt;(n);
+
+      for (int i = 0; i &lt; n; i++) {
+        BlockRegion region = new BlockRegion(in);
+        listRegions.add(region);
+      }
+    }
+
+    // for write
+    public DataIndex(String defaultCompressionAlgorithmName) {
+      this.defaultCompressionAlgorithm =
+          Compression
+              .getCompressionAlgorithmByName(defaultCompressionAlgorithmName);
+      listRegions = new ArrayList&lt;BlockRegion&gt;();
+    }
+
+    public Algorithm getDefaultCompressionAlgorithm() {
+      return defaultCompressionAlgorithm;
+    }
+
+    public ArrayList&lt;BlockRegion&gt; getBlockRegionList() {
+      return listRegions;
+    }
+
+    public void addBlockRegion(BlockRegion region) {
+      listRegions.add(region);
+    }
+
+    public void write(DataOutput out) throws IOException {
+      Utils.writeString(out, defaultCompressionAlgorithm.getName());
+
+      Utils.writeVInt(out, listRegions.size());
+
+      for (BlockRegion region : listRegions) {
+        region.write(out);
+      }
+    }
+  }
+
+  /**
+   * Magic number uniquely identifying a BCFile in the header/footer.
+   */
+  static final class Magic {
+    private final static byte[] AB_MAGIC_BCFILE =
+        {
+            // ... total of 16 bytes
+            (byte) 0xd1, (byte) 0x11, (byte) 0xd3, (byte) 0x68, (byte) 0x91,
+            (byte) 0xb5, (byte) 0xd7, (byte) 0xb6, (byte) 0x39, (byte) 0xdf,
+            (byte) 0x41, (byte) 0x40, (byte) 0x92, (byte) 0xba, (byte) 0xe1,
+            (byte) 0x50 };
+
+    public static void readAndVerify(DataInput in) throws IOException {
+      byte[] abMagic = new byte[size()];
+      in.readFully(abMagic);
+
+      // check against AB_MAGIC_BCFILE, if not matching, throw an
+      // Exception
+      if (!Arrays.equals(abMagic, AB_MAGIC_BCFILE)) {
+        throw new IOException("Not a valid BCFile.");
+      }
+    }
+
+    public static void write(DataOutput out) throws IOException {
+      out.write(AB_MAGIC_BCFILE);
+    }
+
+    public static int size() {
+      return AB_MAGIC_BCFILE.length;
+    }
+  }
+
+  /**
+   * Block region.
+   */
+  static final class BlockRegion implements Scalar {
+    private final long offset;
+    private final long compressedSize;
+    private final long rawSize;
+
+    public BlockRegion(DataInput in) throws IOException {
+      offset = Utils.readVLong(in);
+      compressedSize = Utils.readVLong(in);
+      rawSize = Utils.readVLong(in);
+    }
+
+    public BlockRegion(long offset, long compressedSize, long rawSize) {
+      this.offset = offset;
+      this.compressedSize = compressedSize;
+      this.rawSize = rawSize;
+    }
+
+    public void write(DataOutput out) throws IOException {
+      Utils.writeVLong(out, offset);
+      Utils.writeVLong(out, compressedSize);
+      Utils.writeVLong(out, rawSize);
+    }
+
+    public long getOffset() {
+      return offset;
+    }
+
+    public long getCompressedSize() {
+      return compressedSize;
+    }
+
+    public long getRawSize() {
+      return rawSize;
+    }
+
+    @Override
+    public long magnitude() {
+      return offset;
+    }
+  }
+}

Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BoundedByteArrayOutputStream.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BoundedByteArrayOutputStream.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BoundedByteArrayOutputStream.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BoundedByteArrayOutputStream.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,96 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.io.OutputStream;
+
+/**
+ * A byte array backed output stream with a limit. The limit should be smaller
+ * than the buffer capacity. The object can be reused through &lt;code&gt;reset&lt;/code&gt;
+ * API and choose different limits in each round.
+ */
+class BoundedByteArrayOutputStream extends OutputStream {
+  private final byte[] buffer;
+  private int limit;
+  private int count;
+
+  public BoundedByteArrayOutputStream(int capacity) {
+    this(capacity, capacity);
+  }
+
+  public BoundedByteArrayOutputStream(int capacity, int limit) {
+    if ((capacity &lt; limit) || (capacity | limit) &lt; 0) {
+      throw new IllegalArgumentException("Invalid capacity/limit");
+    }
+    this.buffer = new byte[capacity];
+    this.limit = limit;
+    this.count = 0;
+  }
+
+  @Override
+  public void write(int b) throws IOException {
+    if (count &gt;= limit) {
+      throw new EOFException("Reaching the limit of the buffer.");
+    }
+    buffer[count++] = (byte) b;
+  }
+
+  @Override
+  public void write(byte b[], int off, int len) throws IOException {
+    if ((off &lt; 0) || (off &gt; b.length) || (len &lt; 0) || ((off + len) &gt; b.length)
+        || ((off + len) &lt; 0)) {
+      throw new IndexOutOfBoundsException();
+    } else if (len == 0) {
+      return;
+    }
+
+    if (count + len &gt; limit) {
+      throw new EOFException("Reach the limit of the buffer");
+    }
+
+    System.arraycopy(b, off, buffer, count, len);
+    count += len;
+  }
+
+  public void reset(int newlim) {
+    if (newlim &gt; buffer.length) {
+      throw new IndexOutOfBoundsException("Limit exceeds buffer size");
+    }
+    this.limit = newlim;
+    this.count = 0;
+  }
+
+  public void reset() {
+    this.limit = buffer.length;
+    this.count = 0;
+  }
+
+  public int getLimit() {
+    return limit;
+  }
+
+  public byte[] getBuffer() {
+    return buffer;
+  }
+
+  public int size() {
+    return count;
+  }
+}

Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BoundedRangeFileInputStream.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BoundedRangeFileInputStream.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BoundedRangeFileInputStream.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/BoundedRangeFileInputStream.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,141 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.IOException;
+import java.io.InputStream;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+
+/**
+ * BoundedRangeFIleInputStream abstracts a contiguous region of a Hadoop
+ * FSDataInputStream as a regular input stream. One can create multiple
+ * BoundedRangeFileInputStream on top of the same FSDataInputStream and they
+ * would not interfere with each other.
+ */
+class BoundedRangeFileInputStream extends InputStream {
+
+  private FSDataInputStream in;
+  private long pos;
+  private long end;
+  private long mark;
+  private final byte[] oneByte = new byte[1];
+
+  /**
+   * Constructor
+   * 
+   * @param in
+   *          The FSDataInputStream we connect to.
+   * @param offset
+   *          Begining offset of the region.
+   * @param length
+   *          Length of the region.
+   * 
+   *          The actual length of the region may be smaller if (off_begin +
+   *          length) goes beyond the end of FS input stream.
+   */
+  public BoundedRangeFileInputStream(FSDataInputStream in, long offset,
+      long length) {
+    if (offset &lt; 0 || length &lt; 0) {
+      throw new IndexOutOfBoundsException("Invalid offset/length: " + offset
+          + "/" + length);
+    }
+
+    this.in = in;
+    this.pos = offset;
+    this.end = offset + length;
+    this.mark = -1;
+  }
+
+  @Override
+  public int available() throws IOException {
+    int avail = in.available();
+    if (pos + avail &gt; end) {
+      avail = (int) (end - pos);
+    }
+
+    return avail;
+  }
+
+  @Override
+  public int read() throws IOException {
+    int ret = read(oneByte);
+    if (ret == 1) return oneByte[0] &amp; 0xff;
+    return -1;
+  }
+
+  @Override
+  public int read(byte[] b) throws IOException {
+    return read(b, 0, b.length);
+  }
+
+  @Override
+  public int read(byte[] b, int off, int len) throws IOException {
+    if ((off | len | (off + len) | (b.length - (off + len))) &lt; 0) {
+      throw new IndexOutOfBoundsException();
+    }
+
+    int n = (int) Math.min(Integer.MAX_VALUE, Math.min(len, (end - pos)));
+    if (n == 0) return -1;
+    int ret = 0;
+    synchronized (in) {
+      in.seek(pos);
+      ret = in.read(b, off, n);
+    }
+    if (ret &lt; 0) {
+      end = pos;
+      return -1;
+    }
+    pos += ret;
+    return ret;
+  }
+
+  @Override
+  /*
+   * We may skip beyond the end of the file.
+   */
+  public long skip(long n) throws IOException {
+    long len = Math.min(n, end - pos);
+    pos += len;
+    return len;
+  }
+
+  @Override
+  public void mark(int readlimit) {
+    mark = pos;
+  }
+
+  @Override
+  public void reset() throws IOException {
+    if (mark &lt; 0) throw new IOException("Resetting to invalid mark");
+    pos = mark;
+  }
+
+  @Override
+  public boolean markSupported() {
+    return true;
+  }
+
+  @Override
+  public void close() {
+    // Invalidate the state of the stream.
+    in = null;
+    pos = end;
+    mark = -1;
+  }
+}

Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/ByteArray.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/ByteArray.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/ByteArray.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/ByteArray.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,92 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import org.apache.hadoop.io.BytesWritable;
+
+/**
+ * Adaptor class to wrap byte-array backed objects (including java byte array)
+ * as RawComparable objects.
+ */
+public final class ByteArray implements RawComparable {
+  private final byte[] buffer;
+  private final int offset;
+  private final int len;
+
+  /**
+   * Constructing a ByteArray from a {@link BytesWritable}.
+   * 
+   * @param other
+   */
+  public ByteArray(BytesWritable other) {
+    this(other.get(), 0, other.getSize());
+  }
+
+  /**
+   * Wrap a whole byte array as a RawComparable.
+   * 
+   * @param buffer
+   *          the byte array buffer.
+   */
+  public ByteArray(byte[] buffer) {
+    this(buffer, 0, buffer.length);
+  }
+
+  /**
+   * Wrap a partial byte array as a RawComparable.
+   * 
+   * @param buffer
+   *          the byte array buffer.
+   * @param offset
+   *          the starting offset
+   * @param len
+   *          the length of the consecutive bytes to be wrapped.
+   */
+  public ByteArray(byte[] buffer, int offset, int len) {
+    if ((offset | len | (buffer.length - offset - len)) &lt; 0) {
+      throw new IndexOutOfBoundsException();
+    }
+    this.buffer = buffer;
+    this.offset = offset;
+    this.len = len;
+  }
+
+  /**
+   * @return the underlying buffer.
+   */
+  @Override
+  public byte[] buffer() {
+    return buffer;
+  }
+
+  /**
+   * @return the offset in the buffer.
+   */
+  @Override
+  public int offset() {
+    return offset;
+  }
+
+  /**
+   * @return the size of the byte array.
+   */
+  @Override
+  public int size() {
+    return len;
+  }
+}

Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Chunk.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Chunk.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Chunk.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Chunk.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,429 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+
+/**
+ * Several related classes to support chunk-encoded sub-streams on top of a
+ * regular stream.
+ */
+final class Chunk {
+
+  /**
+   * Prevent the instantiation of class.
+   */
+  private Chunk() {
+    // nothing
+  }
+
+  /**
+   * Decoding a chain of chunks encoded through ChunkEncoder or
+   * SingleChunkEncoder.
+   */
+  static public class ChunkDecoder extends InputStream {
+    private DataInputStream in = null;
+    private boolean lastChunk;
+    private int remain = 0;
+    private boolean closed;
+
+    public ChunkDecoder() {
+      lastChunk = true;
+      closed = true;
+    }
+
+    public void reset(DataInputStream downStream) {
+      // no need to wind forward the old input.
+      in = downStream;
+      lastChunk = false;
+      remain = 0;
+      closed = false;
+    }
+
+    /**
+     * Constructor
+     * 
+     * @param in
+     *          The source input stream which contains chunk-encoded data
+     *          stream.
+     */
+    public ChunkDecoder(DataInputStream in) {
+      this.in = in;
+      lastChunk = false;
+      closed = false;
+    }
+
+    /**
+     * Have we reached the last chunk.
+     * 
+     * @return true if we have reached the last chunk.
+     * @throws java.io.IOException
+     */
+    public boolean isLastChunk() throws IOException {
+      checkEOF();
+      return lastChunk;
+    }
+
+    /**
+     * How many bytes remain in the current chunk?
+     * 
+     * @return remaining bytes left in the current chunk.
+     * @throws java.io.IOException
+     */
+    public int getRemain() throws IOException {
+      checkEOF();
+      return remain;
+    }
+
+    /**
+     * Reading the length of next chunk.
+     * 
+     * @throws java.io.IOException
+     *           when no more data is available.
+     */
+    private void readLength() throws IOException {
+      remain = Utils.readVInt(in);
+      if (remain &gt;= 0) {
+        lastChunk = true;
+      } else {
+        remain = -remain;
+      }
+    }
+
+    /**
+     * Check whether we reach the end of the stream.
+     * 
+     * @return false if the chunk encoded stream has more data to read (in which
+     *         case available() will be greater than 0); true otherwise.
+     * @throws java.io.IOException
+     *           on I/O errors.
+     */
+    private boolean checkEOF() throws IOException {
+      if (isClosed()) return true;
+      while (true) {
+        if (remain &gt; 0) return false;
+        if (lastChunk) return true;
+        readLength();
+      }
+    }
+
+    @Override
+    /*
+     * This method never blocks the caller. Returning 0 does not mean we reach
+     * the end of the stream.
+     */
+    public int available() {
+      return remain;
+    }
+
+    @Override
+    public int read() throws IOException {
+      if (checkEOF()) return -1;
+      int ret = in.read();
+      if (ret &lt; 0) throw new IOException("Corrupted chunk encoding stream");
+      --remain;
+      return ret;
+    }
+
+    @Override
+    public int read(byte[] b) throws IOException {
+      return read(b, 0, b.length);
+    }
+
+    @Override
+    public int read(byte[] b, int off, int len) throws IOException {
+      if ((off | len | (off + len) | (b.length - (off + len))) &lt; 0) {
+        throw new IndexOutOfBoundsException();
+      }
+
+      if (!checkEOF()) {
+        int n = Math.min(remain, len);
+        int ret = in.read(b, off, n);
+        if (ret &lt; 0) throw new IOException("Corrupted chunk encoding stream");
+        remain -= ret;
+        return ret;
+      }
+      return -1;
+    }
+
+    @Override
+    public long skip(long n) throws IOException {
+      if (!checkEOF()) {
+        long ret = in.skip(Math.min(remain, n));
+        remain -= ret;
+        return ret;
+      }
+      return 0;
+    }
+
+    @Override
+    public boolean markSupported() {
+      return false;
+    }
+
+    public boolean isClosed() {
+      return closed;
+    }
+
+    @Override
+    public void close() throws IOException {
+      if (closed == false) {
+        try {
+          while (!checkEOF()) {
+            skip(Integer.MAX_VALUE);
+          }
+        } finally {
+          closed = true;
+        }
+      }
+    }
+  }
+
+  /**
+   * Chunk Encoder. Encoding the output data into a chain of chunks in the
+   * following sequences: -len1, byte[len1], -len2, byte[len2], ... len_n,
+   * byte[len_n]. Where len1, len2, ..., len_n are the lengths of the data
+   * chunks. Non-terminal chunks have their lengths negated. Non-terminal chunks
+   * cannot have length 0. All lengths are in the range of 0 to
+   * Integer.MAX_VALUE and are encoded in Utils.VInt format.
+   */
+  static public class ChunkEncoder extends OutputStream {
+    /**
+     * The data output stream it connects to.
+     */
+    private DataOutputStream out;
+
+    /**
+     * The internal buffer that is only used when we do not know the advertised
+     * size.
+     */
+    private byte buf[];
+
+    /**
+     * The number of valid bytes in the buffer. This value is always in the
+     * range &lt;tt&gt;0&lt;/tt&gt; through &lt;tt&gt;buf.length&lt;/tt&gt;; elements &lt;tt&gt;buf[0]&lt;/tt&gt;
+     * through &lt;tt&gt;buf[count-1]&lt;/tt&gt; contain valid byte data.
+     */
+    private int count;
+
+    /**
+     * Constructor.
+     * 
+     * @param out
+     *          the underlying output stream.
+     * @param buf
+     *          user-supplied buffer. The buffer would be used exclusively by
+     *          the ChunkEncoder during its life cycle.
+     */
+    public ChunkEncoder(DataOutputStream out, byte[] buf) {
+      this.out = out;
+      this.buf = buf;
+      this.count = 0;
+    }
+
+    /**
+     * Write out a chunk.
+     * 
+     * @param chunk
+     *          The chunk buffer.
+     * @param offset
+     *          Offset to chunk buffer for the beginning of chunk.
+     * @param len
+     * @param last
+     *          Is this the last call to flushBuffer?
+     */
+    private void writeChunk(byte[] chunk, int offset, int len, boolean last)
+        throws IOException {
+      if (last) { // always write out the length for the last chunk.
+        Utils.writeVInt(out, len);
+        if (len &gt; 0) {
+          out.write(chunk, offset, len);
+        }
+      } else {
+        if (len &gt; 0) {
+          Utils.writeVInt(out, -len);
+          out.write(chunk, offset, len);
+        }
+      }
+    }
+
+    /**
+     * Write out a chunk that is a concatenation of the internal buffer plus
+     * user supplied data. This will never be the last block.
+     * 
+     * @param data
+     *          User supplied data buffer.
+     * @param offset
+     *          Offset to user data buffer.
+     * @param len
+     *          User data buffer size.
+     */
+    private void writeBufData(byte[] data, int offset, int len)
+        throws IOException {
+      if (count + len &gt; 0) {
+        Utils.writeVInt(out, -(count + len));
+        out.write(buf, 0, count);
+        count = 0;
+        out.write(data, offset, len);
+      }
+    }
+
+    /**
+     * Flush the internal buffer.
+     * 
+     * Is this the last call to flushBuffer?
+     * 
+     * @throws java.io.IOException
+     */
+    private void flushBuffer() throws IOException {
+      if (count &gt; 0) {
+        writeChunk(buf, 0, count, false);
+        count = 0;
+      }
+    }
+
+    @Override
+    public void write(int b) throws IOException {
+      if (count &gt;= buf.length) {
+        flushBuffer();
+      }
+      buf[count++] = (byte) b;
+    }
+
+    @Override
+    public void write(byte b[]) throws IOException {
+      write(b, 0, b.length);
+    }
+
+    @Override
+    public void write(byte b[], int off, int len) throws IOException {
+      if ((len + count) &gt;= buf.length) {
+        /*
+         * If the input data do not fit in buffer, flush the output buffer and
+         * then write the data directly. In this way buffered streams will
+         * cascade harmlessly.
+         */
+        writeBufData(b, off, len);
+        return;
+      }
+
+      System.arraycopy(b, off, buf, count, len);
+      count += len;
+    }
+
+    @Override
+    public void flush() throws IOException {
+      flushBuffer();
+      out.flush();
+    }
+
+    @Override
+    public void close() throws IOException {
+      if (buf != null) {
+        try {
+          writeChunk(buf, 0, count, true);
+        } finally {
+          buf = null;
+          out = null;
+        }
+      }
+    }
+  }
+
+  /**
+   * Encode the whole stream as a single chunk. Expecting to know the size of
+   * the chunk up-front.
+   */
+  static public class SingleChunkEncoder extends OutputStream {
+    /**
+     * The data output stream it connects to.
+     */
+    private final DataOutputStream out;
+
+    /**
+     * The remaining bytes to be written.
+     */
+    private int remain;
+    private boolean closed = false;
+
+    /**
+     * Constructor.
+     * 
+     * @param out
+     *          the underlying output stream.
+     * @param size
+     *          The total # of bytes to be written as a single chunk.
+     * @throws java.io.IOException
+     *           if an I/O error occurs.
+     */
+    public SingleChunkEncoder(DataOutputStream out, int size)
+        throws IOException {
+      this.out = out;
+      this.remain = size;
+      Utils.writeVInt(out, size);
+    }
+
+    @Override
+    public void write(int b) throws IOException {
+      if (remain &gt; 0) {
+        out.write(b);
+        --remain;
+      } else {
+        throw new IOException("Writing more bytes than advertised size.");
+      }
+    }
+
+    @Override
+    public void write(byte b[]) throws IOException {
+      write(b, 0, b.length);
+    }
+
+    @Override
+    public void write(byte b[], int off, int len) throws IOException {
+      if (remain &gt;= len) {
+        out.write(b, off, len);
+        remain -= len;
+      } else {
+        throw new IOException("Writing more bytes than advertised size.");
+      }
+    }
+
+    @Override
+    public void flush() throws IOException {
+      out.flush();
+    }
+
+    @Override
+    public void close() throws IOException {
+      if (closed == true) {
+        return;
+      }
+
+      try {
+        if (remain &gt; 0) {
+          throw new IOException("Writing less bytes than advertised size.");
+        }
+      } finally {
+        closed = true;
+      }
+    }
+  }
+}

Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/CompareUtils.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/CompareUtils.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/CompareUtils.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/CompareUtils.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,97 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.io.file.tfile;
+
+import java.util.Comparator;
+
+import org.apache.hadoop.io.RawComparator;
+import org.apache.hadoop.io.WritableComparator;
+
+class CompareUtils {
+  /**
+   * Prevent the instantiation of class.
+   */
+  private CompareUtils() {
+    // nothing
+  }
+
+  /**
+   * A comparator to compare anything that implements {@link RawComparable}
+   * using a customized comparator.
+   */
+  public static final class BytesComparator implements
+      Comparator&lt;RawComparable&gt; {
+    private RawComparator&lt;Object&gt; cmp;
+
+    public BytesComparator(RawComparator&lt;Object&gt; cmp) {
+      this.cmp = cmp;
+    }
+
+    @Override
+    public int compare(RawComparable o1, RawComparable o2) {
+      return compare(o1.buffer(), o1.offset(), o1.size(), o2.buffer(), o2
+          .offset(), o2.size());
+    }
+
+    public int compare(byte[] a, int off1, int len1, byte[] b, int off2,
+        int len2) {
+      return cmp.compare(a, off1, len1, b, off2, len2);
+    }
+  }
+
+  /**
+   * Interface for all objects that has a single integer magnitude.
+   */
+  static interface Scalar {
+    long magnitude();
+  }
+
+  static final class ScalarLong implements Scalar {
+    private long magnitude;
+
+    public ScalarLong(long m) {
+      magnitude = m;
+    }
+
+    public long magnitude() {
+      return magnitude;
+    }
+  }
+
+  public static final class ScalarComparator implements Comparator&lt;Scalar&gt; {
+    @Override
+    public int compare(Scalar o1, Scalar o2) {
+      long diff = o1.magnitude() - o2.magnitude();
+      if (diff &lt; 0) return -1;
+      if (diff &gt; 0) return 1;
+      return 0;
+    }
+  }
+
+  public static final class MemcmpRawComparator implements
+      RawComparator&lt;Object&gt; {
+    @Override
+    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
+      return WritableComparator.compareBytes(b1, s1, l1, b2, s2, l2);
+    }
+
+    @Override
+    public int compare(Object o1, Object o2) {
+      throw new RuntimeException("Object comparison not supported");
+    }
+  }
+}

Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Compression.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Compression.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Compression.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/Compression.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,361 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.BufferedInputStream;
+import java.io.BufferedOutputStream;
+import java.io.FilterOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.ArrayList;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.CodecPool;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionInputStream;
+import org.apache.hadoop.io.compress.CompressionOutputStream;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.Decompressor;
+import org.apache.hadoop.io.compress.DefaultCodec;
+import org.apache.hadoop.util.ReflectionUtils;
+
+/**
+ * Compression related stuff.
+ */
+final class Compression {
+  static final Log LOG = LogFactory.getLog(Compression.class);
+
+  /**
+   * Prevent the instantiation of class.
+   */
+  private Compression() {
+    // nothing
+  }
+
+  static class FinishOnFlushCompressionStream extends FilterOutputStream {
+    public FinishOnFlushCompressionStream(CompressionOutputStream cout) {
+      super(cout);
+    }
+
+    @Override
+    public void write(byte b[], int off, int len) throws IOException {
+      out.write(b, off, len);
+    }
+
+    @Override
+    public void flush() throws IOException {
+      CompressionOutputStream cout = (CompressionOutputStream) out;
+      cout.finish();
+      cout.flush();
+      cout.resetState();
+    }
+  }
+
+  /**
+   * Compression algorithms.
+   */
+  static enum Algorithm {
+    LZO(TFile.COMPRESSION_LZO) {
+      private transient boolean checked = false;
+      private static final String defaultClazz =
+          "org.apache.hadoop.io.compress.LzoCodec";
+      private transient CompressionCodec codec = null;
+
+      @Override
+      public synchronized boolean isSupported() {
+        if (!checked) {
+          checked = true;
+          String extClazz =
+              (conf.get(CONF_LZO_CLASS) == null ? System
+                  .getProperty(CONF_LZO_CLASS) : null);
+          String clazz = (extClazz != null) ? extClazz : defaultClazz;
+          try {
+            LOG.info("Trying to load Lzo codec class: " + clazz);
+            codec =
+                (CompressionCodec) ReflectionUtils.newInstance(Class
+                    .forName(clazz), conf);
+          } catch (ClassNotFoundException e) {
+            // that is okay
+          }
+        }
+        return codec != null;
+      }
+
+      @Override
+      CompressionCodec getCodec() throws IOException {
+        if (!isSupported()) {
+          throw new IOException(
+              "LZO codec class not specified. Did you forget to set property "
+                  + CONF_LZO_CLASS + "?");
+        }
+
+        return codec;
+      }
+
+      @Override
+      public synchronized InputStream createDecompressionStream(
+          InputStream downStream, Decompressor decompressor,
+          int downStreamBufferSize) throws IOException {
+        if (!isSupported()) {
+          throw new IOException(
+              "LZO codec class not specified. Did you forget to set property "
+                  + CONF_LZO_CLASS + "?");
+        }
+        InputStream bis1 = null;
+        if (downStreamBufferSize &gt; 0) {
+          bis1 = new BufferedInputStream(downStream, downStreamBufferSize);
+        } else {
+          bis1 = downStream;
+        }
+        conf.setInt("io.compression.codec.lzo.buffersize", 64 * 1024);
+        CompressionInputStream cis =
+            codec.createInputStream(bis1, decompressor);
+        BufferedInputStream bis2 = new BufferedInputStream(cis, DATA_IBUF_SIZE);
+        return bis2;
+      }
+
+      @Override
+      public synchronized OutputStream createCompressionStream(
+          OutputStream downStream, Compressor compressor,
+          int downStreamBufferSize) throws IOException {
+        if (!isSupported()) {
+          throw new IOException(
+              "LZO codec class not specified. Did you forget to set property "
+                  + CONF_LZO_CLASS + "?");
+        }
+        OutputStream bos1 = null;
+        if (downStreamBufferSize &gt; 0) {
+          bos1 = new BufferedOutputStream(downStream, downStreamBufferSize);
+        } else {
+          bos1 = downStream;
+        }
+        conf.setInt("io.compression.codec.lzo.buffersize", 64 * 1024);
+        CompressionOutputStream cos =
+            codec.createOutputStream(bos1, compressor);
+        BufferedOutputStream bos2 =
+            new BufferedOutputStream(new FinishOnFlushCompressionStream(cos),
+                DATA_OBUF_SIZE);
+        return bos2;
+      }
+    },
+
+    GZ(TFile.COMPRESSION_GZ) {
+      private transient DefaultCodec codec;
+
+      @Override
+      CompressionCodec getCodec() {
+        if (codec == null) {
+          codec = new DefaultCodec();
+          codec.setConf(conf);
+        }
+
+        return codec;
+      }
+
+      @Override
+      public synchronized InputStream createDecompressionStream(
+          InputStream downStream, Decompressor decompressor,
+          int downStreamBufferSize) throws IOException {
+        // Set the internal buffer size to read from down stream.
+        if (downStreamBufferSize &gt; 0) {
+          codec.getConf().setInt("io.file.buffer.size", downStreamBufferSize);
+        }
+        CompressionInputStream cis =
+            codec.createInputStream(downStream, decompressor);
+        BufferedInputStream bis2 = new BufferedInputStream(cis, DATA_IBUF_SIZE);
+        return bis2;
+      }
+
+      @Override
+      public synchronized OutputStream createCompressionStream(
+          OutputStream downStream, Compressor compressor,
+          int downStreamBufferSize) throws IOException {
+        OutputStream bos1 = null;
+        if (downStreamBufferSize &gt; 0) {
+          bos1 = new BufferedOutputStream(downStream, downStreamBufferSize);
+        } else {
+          bos1 = downStream;
+        }
+        codec.getConf().setInt("io.file.buffer.size", 32 * 1024);
+        CompressionOutputStream cos =
+            codec.createOutputStream(bos1, compressor);
+        BufferedOutputStream bos2 =
+            new BufferedOutputStream(new FinishOnFlushCompressionStream(cos),
+                DATA_OBUF_SIZE);
+        return bos2;
+      }
+
+      @Override
+      public boolean isSupported() {
+        return true;
+      }
+    },
+
+    NONE(TFile.COMPRESSION_NONE) {
+      @Override
+      CompressionCodec getCodec() {
+        return null;
+      }
+
+      @Override
+      public synchronized InputStream createDecompressionStream(
+          InputStream downStream, Decompressor decompressor,
+          int downStreamBufferSize) throws IOException {
+        if (downStreamBufferSize &gt; 0) {
+          return new BufferedInputStream(downStream, downStreamBufferSize);
+        }
+        return downStream;
+      }
+
+      @Override
+      public synchronized OutputStream createCompressionStream(
+          OutputStream downStream, Compressor compressor,
+          int downStreamBufferSize) throws IOException {
+        if (downStreamBufferSize &gt; 0) {
+          return new BufferedOutputStream(downStream, downStreamBufferSize);
+        }
+
+        return downStream;
+      }
+
+      @Override
+      public boolean isSupported() {
+        return true;
+      }
+    };
+
+    // We require that all compression related settings are configured
+    // statically in the Configuration object.
+    protected static final Configuration conf = new Configuration();
+    private final String compressName;
+    // data input buffer size to absorb small reads from application.
+    private static final int DATA_IBUF_SIZE = 1 * 1024;
+    // data output buffer size to absorb small writes from application.
+    private static final int DATA_OBUF_SIZE = 4 * 1024;
+    public static final String CONF_LZO_CLASS =
+        "io.compression.codec.lzo.class";
+
+    Algorithm(String name) {
+      this.compressName = name;
+    }
+
+    abstract CompressionCodec getCodec() throws IOException;
+
+    public abstract InputStream createDecompressionStream(
+        InputStream downStream, Decompressor decompressor,
+        int downStreamBufferSize) throws IOException;
+
+    public abstract OutputStream createCompressionStream(
+        OutputStream downStream, Compressor compressor, int downStreamBufferSize)
+        throws IOException;
+
+    public abstract boolean isSupported();
+
+    public Compressor getCompressor() throws IOException {
+      CompressionCodec codec = getCodec();
+      if (codec != null) {
+        Compressor compressor = CodecPool.getCompressor(codec);
+        if (compressor != null) {
+          if (compressor.finished()) {
+            // Somebody returns the compressor to CodecPool but is still using
+            // it.
+            LOG.warn("Compressor obtained from CodecPool already finished()");
+          } else {
+            LOG.debug("Got a compressor: " + compressor.hashCode());
+          }
+          /**
+           * Following statement is necessary to get around bugs in 0.18 where a
+           * compressor is referenced after returned back to the codec pool.
+           */
+          compressor.reset();
+        }
+        return compressor;
+      }
+      return null;
+    }
+
+    public void returnCompressor(Compressor compressor) {
+      if (compressor != null) {
+        LOG.debug("Return a compressor: " + compressor.hashCode());
+        CodecPool.returnCompressor(compressor);
+      }
+    }
+
+    public Decompressor getDecompressor() throws IOException {
+      CompressionCodec codec = getCodec();
+      if (codec != null) {
+        Decompressor decompressor = CodecPool.getDecompressor(codec);
+        if (decompressor != null) {
+          if (decompressor.finished()) {
+            // Somebody returns the decompressor to CodecPool but is still using
+            // it.
+            LOG.warn("Deompressor obtained from CodecPool already finished()");
+          } else {
+            LOG.debug("Got a decompressor: " + decompressor.hashCode());
+          }
+          /**
+           * Following statement is necessary to get around bugs in 0.18 where a
+           * decompressor is referenced after returned back to the codec pool.
+           */
+          decompressor.reset();
+        }
+        return decompressor;
+      }
+
+      return null;
+    }
+
+    public void returnDecompressor(Decompressor decompressor) {
+      if (decompressor != null) {
+        LOG.debug("Returned a decompressor: " + decompressor.hashCode());
+        CodecPool.returnDecompressor(decompressor);
+      }
+    }
+
+    public String getName() {
+      return compressName;
+    }
+  }
+
+  static Algorithm getCompressionAlgorithmByName(String compressName) {
+    Algorithm[] algos = Algorithm.class.getEnumConstants();
+
+    for (Algorithm a : algos) {
+      if (a.getName().equals(compressName)) {
+        return a;
+      }
+    }
+
+    throw new IllegalArgumentException(
+        "Unsupported compression algorithm name: " + compressName);
+  }
+
+  static String[] getSupportedAlgorithms() {
+    Algorithm[] algos = Algorithm.class.getEnumConstants();
+
+    ArrayList&lt;String&gt; ret = new ArrayList&lt;String&gt;();
+    for (Algorithm a : algos) {
+      if (a.isSupported()) {
+        ret.add(a.getName());
+      }
+    }
+    return ret.toArray(new String[ret.size()]);
+  }
+}

Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/MetaBlockAlreadyExists.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/MetaBlockAlreadyExists.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/MetaBlockAlreadyExists.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/MetaBlockAlreadyExists.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,36 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.IOException;
+
+/**
+ * Exception - Meta Block with the same name already exists.
+ */
+@SuppressWarnings("serial")
+public class MetaBlockAlreadyExists extends IOException {
+  /**
+   * Constructor
+   * 
+   * @param s
+   *          message.
+   */
+  MetaBlockAlreadyExists(String s) {
+    super(s);
+  }
+}

Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/MetaBlockDoesNotExist.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/MetaBlockDoesNotExist.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/MetaBlockDoesNotExist.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/MetaBlockDoesNotExist.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,36 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.IOException;
+
+/**
+ * Exception - No such Meta Block with the given name.
+ */
+@SuppressWarnings("serial")
+public class MetaBlockDoesNotExist extends IOException {
+  /**
+   * Constructor
+   * 
+   * @param s
+   *          message.
+   */
+  MetaBlockDoesNotExist(String s) {
+    super(s);
+  }
+}

Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/RawComparable.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/RawComparable.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/RawComparable.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/RawComparable.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,57 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.util.Collections;
+import java.util.Comparator;
+
+import org.apache.hadoop.io.RawComparator;
+
+/**
+ * Interface for objects that can be compared through {@link RawComparator}.
+ * This is useful in places where we need a single object reference to specify a
+ * range of bytes in a byte array, such as {@link Comparable} or
+ * {@link Collections#binarySearch(java.util.List, Object, Comparator)}
+ * 
+ * The actual comparison among RawComparable's requires an external
+ * RawComparator and it is applications' responsibility to ensure two
+ * RawComparable are supposed to be semantically comparable with the same
+ * RawComparator.
+ */
+public interface RawComparable {
+  /**
+   * Get the underlying byte array.
+   * 
+   * @return The underlying byte array.
+   */
+  abstract byte[] buffer();
+
+  /**
+   * Get the offset of the first byte in the byte array.
+   * 
+   * @return The offset of the first byte in the byte array.
+   */
+  abstract int offset();
+
+  /**
+   * Get the size of the byte range in the byte array.
+   * 
+   * @return The size of the byte range in the byte array.
+   */
+  abstract int size();
+}

Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/SimpleBufferedOutputStream.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/SimpleBufferedOutputStream.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/SimpleBufferedOutputStream.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/SimpleBufferedOutputStream.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,77 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.FilterOutputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+
+/**
+ * A simplified BufferedOutputStream with borrowed buffer, and allow users to
+ * see how much data have been buffered.
+ */
+class SimpleBufferedOutputStream extends FilterOutputStream {
+  protected byte buf[]; // the borrowed buffer
+  protected int count = 0; // bytes used in buffer.
+
+  // Constructor
+  public SimpleBufferedOutputStream(OutputStream out, byte[] buf) {
+    super(out);
+    this.buf = buf;
+  }
+
+  private void flushBuffer() throws IOException {
+    if (count &gt; 0) {
+      out.write(buf, 0, count);
+      count = 0;
+    }
+  }
+
+  @Override
+  public void write(int b) throws IOException {
+    if (count &gt;= buf.length) {
+      flushBuffer();
+    }
+    buf[count++] = (byte) b;
+  }
+
+  @Override
+  public void write(byte b[], int off, int len) throws IOException {
+    if (len &gt;= buf.length) {
+      flushBuffer();
+      out.write(b, off, len);
+      return;
+    }
+    if (len &gt; buf.length - count) {
+      flushBuffer();
+    }
+    System.arraycopy(b, off, buf, count, len);
+    count += len;
+  }
+
+  @Override
+  public synchronized void flush() throws IOException {
+    flushBuffer();
+    out.flush();
+  }
+
+  // Get the size of internal buffer being used.
+  public int size() {
+    return count;
+  }
+}




</pre>
</div>
</content>
</entry>
<entry>
<title>svn commit: r787913 [2/4] - in /hadoop/common/trunk: ./ src/java/org/apache/hadoop/io/file/ src/java/org/apache/hadoop/io/file/tfile/ src/test/ src/test/core/org/apache/hadoop/io/file/ src/test/core/org/apache/hadoop/io/file/tfile/</title>
<author><name>cdouglas@apache.org</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/200906.mbox/%3c20090624054828.8644823888CF@eris.apache.org%3e"/>
<id>urn:uuid:%3c20090624054828-8644823888CF@eris-apache-org%3e</id>
<updated>2009-06-24T05:48:26Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/TFile.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/TFile.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/TFile.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/TFile.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,2220 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.ByteArrayInputStream;
+import java.io.Closeable;
+import java.io.DataInput;
+import java.io.DataInputStream;
+import java.io.DataOutput;
+import java.io.DataOutputStream;
+import java.io.EOFException;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.ArrayList;
+import java.util.Comparator;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DataInputBuffer;
+import org.apache.hadoop.io.DataOutputBuffer;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.io.RawComparator;
+import org.apache.hadoop.io.WritableComparator;
+import org.apache.hadoop.io.file.tfile.BCFile.Reader.BlockReader;
+import org.apache.hadoop.io.file.tfile.BCFile.Writer.BlockAppender;
+import org.apache.hadoop.io.file.tfile.Chunk.ChunkDecoder;
+import org.apache.hadoop.io.file.tfile.Chunk.ChunkEncoder;
+import org.apache.hadoop.io.file.tfile.CompareUtils.BytesComparator;
+import org.apache.hadoop.io.file.tfile.CompareUtils.MemcmpRawComparator;
+import org.apache.hadoop.io.file.tfile.Utils.Version;
+import org.apache.hadoop.io.serializer.JavaSerializationComparator;
+
+/**
+ * A TFile is a container of key-value pairs. Both keys and values are type-less
+ * bytes. Keys are restricted to 64KB, value length is not restricted
+ * (practically limited to the available disk storage). TFile further provides
+ * the following features:
+ * &lt;ul&gt;
+ * &lt;li&gt;Block Compression.
+ * &lt;li&gt;Named meta data blocks.
+ * &lt;li&gt;Sorted or unsorted keys.
+ * &lt;li&gt;Seek by key or by file offset.
+ * &lt;/ul&gt;
+ * The memory footprint of a TFile includes the following:
+ * &lt;ul&gt;
+ * &lt;li&gt;Some constant overhead of reading or writing a compressed block.
+ * &lt;ul&gt;
+ * &lt;li&gt;Each compressed block requires one compression/decompression codec for
+ * I/O.
+ * &lt;li&gt;Temporary space to buffer the key.
+ * &lt;li&gt;Temporary space to buffer the value (for TFile.Writer only). Values are
+ * chunk encoded, so that we buffer at most one chunk of user data. By default,
+ * the chunk buffer is 1MB. Reading chunked value does not require additional
+ * memory.
+ * &lt;/ul&gt;
+ * &lt;li&gt;TFile index, which is proportional to the total number of Data Blocks.
+ * The total amount of memory needed to hold the index can be estimated as
+ * (56+AvgKeySize)*NumBlocks.
+ * &lt;li&gt;MetaBlock index, which is proportional to the total number of Meta
+ * Blocks.The total amount of memory needed to hold the index for Meta Blocks
+ * can be estimated as (40+AvgMetaBlockName)*NumMetaBlock.
+ * &lt;/ul&gt;
+ * &lt;p&gt;
+ * The behavior of TFile can be customized by the following variables through
+ * Configuration:
+ * &lt;ul&gt;
+ * &lt;li&gt;&lt;b&gt;tfile.io.chunk.size&lt;/b&gt;: Value chunk size. Integer (in bytes). Default
+ * to 1MB. Values of the length less than the chunk size is guaranteed to have
+ * known value length in read time (See
+ * {@link TFile.Reader.Scanner.Entry#isValueLengthKnown()}).
+ * &lt;li&gt;&lt;b&gt;tfile.fs.output.buffer.size&lt;/b&gt;: Buffer size used for
+ * FSDataOutputStream. Integer (in bytes). Default to 256KB.
+ * &lt;li&gt;&lt;b&gt;tfile.fs.input.buffer.size&lt;/b&gt;: Buffer size used for
+ * FSDataInputStream. Integer (in bytes). Default to 256KB.
+ * &lt;/ul&gt;
+ * &lt;p&gt;
+ * Suggestions on performance optimization.
+ * &lt;ul&gt;
+ * &lt;li&gt;Minimum block size. We recommend a setting of minimum block size between
+ * 256KB to 1MB for general usage. Larger block size is preferred if files are
+ * primarily for sequential access. However, it would lead to inefficient random
+ * access (because there are more data to decompress). Smaller blocks are good
+ * for random access, but require more memory to hold the block index, and may
+ * be slower to create (because we must flush the compressor stream at the
+ * conclusion of each data block, which leads to an FS I/O flush). Further, due
+ * to the internal caching in Compression codec, the smallest possible block
+ * size would be around 20KB-30KB.
+ * &lt;li&gt;The current implementation does not offer true multi-threading for
+ * reading. The implementation uses FSDataInputStream seek()+read(), which is
+ * shown to be much faster than positioned-read call in single thread mode.
+ * However, it also means that if multiple threads attempt to access the same
+ * TFile (using multiple scanners) simultaneously, the actual I/O is carried out
+ * sequentially even if they access different DFS blocks.
+ * &lt;li&gt;Compression codec. Use "none" if the data is not very compressable (by
+ * compressable, I mean a compression ratio at least 2:1). Generally, use "lzo"
+ * as the starting point for experimenting. "gz" overs slightly better
+ * compression ratio over "lzo" but requires 4x CPU to compress and 2x CPU to
+ * decompress, comparing to "lzo".
+ * &lt;li&gt;File system buffering, if the underlying FSDataInputStream and
+ * FSDataOutputStream is already adequately buffered; or if applications
+ * reads/writes keys and values in large buffers, we can reduce the sizes of
+ * input/output buffering in TFile layer by setting the configuration parameters
+ * "tfile.fs.input.buffer.size" and "tfile.fs.output.buffer.size".
+ * &lt;/ul&gt;
+ * 
+ * Some design rationale behind TFile can be found at &lt;a
+ * href=https://issues.apache.org/jira/browse/HADOOP-3315&gt;Hadoop-3315&lt;/a&gt;.
+ */
+public class TFile {
+  static final Log LOG = LogFactory.getLog(TFile.class);
+
+  private static final String CHUNK_BUF_SIZE_ATTR = "tfile.io.chunk.size";
+  private static final String FS_INPUT_BUF_SIZE_ATTR =
+      "tfile.fs.input.buffer.size";
+  private static final String FS_OUTPUT_BUF_SIZE_ATTR =
+      "tfile.fs.output.buffer.size";
+
+  static int getChunkBufferSize(Configuration conf) {
+    int ret = conf.getInt(CHUNK_BUF_SIZE_ATTR, 1024 * 1024);
+    return (ret &gt; 0) ? ret : 1024 * 1024;
+  }
+
+  static int getFSInputBufferSize(Configuration conf) {
+    return conf.getInt(FS_INPUT_BUF_SIZE_ATTR, 256 * 1024);
+  }
+
+  static int getFSOutputBufferSize(Configuration conf) {
+    return conf.getInt(FS_OUTPUT_BUF_SIZE_ATTR, 256 * 1024);
+  }
+
+  private static final int MAX_KEY_SIZE = 64 * 1024; // 64KB
+  static final Version API_VERSION = new Version((short) 1, (short) 0);
+
+  /** compression: gzip */
+  public static final String COMPRESSION_GZ = "gz";
+  /** compression: lzo */
+  public static final String COMPRESSION_LZO = "lzo";
+  /** compression: none */
+  public static final String COMPRESSION_NONE = "none";
+  /** comparator: memcmp */
+  public static final String COMPARATOR_MEMCMP = "memcmp";
+  /** comparator prefix: java class */
+  public static final String COMPARATOR_JCLASS = "jclass:";
+
+  // Prevent the instantiation of TFiles
+  private TFile() {
+    // nothing
+  }
+
+  /**
+   * Get names of supported compression algorithms. The names are acceptable by
+   * TFile.Writer.
+   * 
+   * @return Array of strings, each represents a supported compression
+   *         algorithm. Currently, the following compression algorithms are
+   *         supported.
+   *         &lt;ul&gt;
+   *         &lt;li&gt;"none" - No compression.
+   *         &lt;li&gt;"lzo" - LZO compression.
+   *         &lt;li&gt;"gz" - GZIP compression.
+   *         &lt;/ul&gt;
+   */
+  public static String[] getSupportedCompressionAlgorithms() {
+    return Compression.getSupportedAlgorithms();
+  }
+
+  /**
+   * TFile Writer.
+   */
+  public static class Writer implements Closeable {
+    // minimum compressed size for a block.
+    private final int sizeMinBlock;
+
+    // Meta blocks.
+    final TFileIndex tfileIndex;
+    final TFileMeta tfileMeta;
+
+    // reference to the underlying BCFile.
+    private BCFile.Writer writerBCF;
+
+    // current data block appender.
+    BlockAppender blkAppender;
+    long blkRecordCount;
+
+    // buffers for caching the key.
+    BoundedByteArrayOutputStream currentKeyBufferOS;
+    BoundedByteArrayOutputStream lastKeyBufferOS;
+
+    // buffer used by chunk codec
+    private byte[] valueBuffer;
+
+    /**
+     * Writer states. The state always transits in circles: READY -&gt; IN_KEY -&gt;
+     * END_KEY -&gt; IN_VALUE -&gt; READY.
+     */
+    private enum State {
+      READY, // Ready to start a new key-value pair insertion.
+      IN_KEY, // In the middle of key insertion.
+      END_KEY, // Key insertion complete, ready to insert value.
+      IN_VALUE, // In value insertion.
+      // ERROR, // Error encountered, cannot continue.
+      CLOSED, // TFile already closed.
+    };
+
+    // current state of Writer.
+    State state = State.READY;
+    Configuration conf;
+    long errorCount = 0;
+
+    /**
+     * Constructor
+     * 
+     * @param fsdos
+     *          output stream for writing. Must be at position 0.
+     * @param minBlockSize
+     *          Minimum compressed block size in bytes. A compression block will
+     *          not be closed until it reaches this size except for the last
+     *          block.
+     * @param compressName
+     *          Name of the compression algorithm. Must be one of the strings
+     *          returned by {@link TFile#getSupportedCompressionAlgorithms()}.
+     * @param comparator
+     *          Leave comparator as null or empty string if TFile is not sorted.
+     *          Otherwise, provide the string name for the comparison algorithm
+     *          for keys. Two kinds of comparators are supported.
+     *          &lt;ul&gt;
+     *          &lt;li&gt;Algorithmic comparator: binary comparators that is language
+     *          independent. Currently, only "memcmp" is supported.
+     *          &lt;li&gt;Language-specific comparator: binary comparators that can
+     *          only be constructed in specific language. For Java, the syntax
+     *          is "jclass:", followed by the class name of the RawComparator.
+     *          Currently, we only support RawComparators that can be
+     *          constructed through the default constructor (with no
+     *          parameters). Parameterized RawComparators such as
+     *          {@link WritableComparator} or
+     *          {@link JavaSerializationComparator} may not be directly used.
+     *          One should write a wrapper class that inherits from such classes
+     *          and use its default constructor to perform proper
+     *          initialization.
+     *          &lt;/ul&gt;
+     * @param conf
+     *          The configuration object.
+     * @throws IOException
+     */
+    public Writer(FSDataOutputStream fsdos, int minBlockSize,
+        String compressName, String comparator, Configuration conf)
+        throws IOException {
+      sizeMinBlock = minBlockSize;
+      tfileMeta = new TFileMeta(comparator);
+      tfileIndex = new TFileIndex(tfileMeta.getComparator());
+
+      writerBCF = new BCFile.Writer(fsdos, compressName, conf);
+      currentKeyBufferOS = new BoundedByteArrayOutputStream(MAX_KEY_SIZE);
+      lastKeyBufferOS = new BoundedByteArrayOutputStream(MAX_KEY_SIZE);
+      this.conf = conf;
+    }
+
+    /**
+     * Close the Writer. Resources will be released regardless of the exceptions
+     * being thrown. Future close calls will have no effect.
+     * 
+     * The underlying FSDataOutputStream is not closed.
+     */
+    public void close() throws IOException {
+      if ((state == State.CLOSED)) {
+        return;
+      }
+      try {
+        // First try the normal finish.
+        // Terminate upon the first Exception.
+        if (errorCount == 0) {
+          if (state != State.READY) {
+            throw new IllegalStateException(
+                "Cannot close TFile in the middle of key-value insertion.");
+          }
+
+          finishDataBlock(true);
+
+          // first, write out data:TFile.meta
+          BlockAppender outMeta =
+              writerBCF
+                  .prepareMetaBlock(TFileMeta.BLOCK_NAME, COMPRESSION_NONE);
+          try {
+            tfileMeta.write(outMeta);
+          } finally {
+            outMeta.close();
+          }
+
+          // second, write out data:TFile.index
+          BlockAppender outIndex =
+              writerBCF.prepareMetaBlock(TFileIndex.BLOCK_NAME);
+          try {
+            tfileIndex.write(outIndex);
+          } finally {
+            outIndex.close();
+          }
+
+          if (writerBCF != null) {
+            writerBCF.close();
+            writerBCF = null;
+          }
+        }
+      } finally {
+        IOUtils.cleanup(LOG, blkAppender, writerBCF);
+        blkAppender = null;
+        writerBCF = null;
+        state = State.CLOSED;
+      }
+    }
+
+    /**
+     * Adding a new key-value pair to the TFile. This is synonymous to
+     * append(key, 0, key.length, value, 0, value.length)
+     * 
+     * @param key
+     *          Buffer for key.
+     * @param value
+     *          Buffer for value.
+     * @throws IOException
+     */
+    public void append(byte[] key, byte[] value) throws IOException {
+      append(key, 0, key.length, value, 0, value.length);
+    }
+
+    /**
+     * Adding a new key-value pair to TFile.
+     * 
+     * @param key
+     *          buffer for key.
+     * @param koff
+     *          offset in key buffer.
+     * @param klen
+     *          length of key.
+     * @param value
+     *          buffer for value.
+     * @param voff
+     *          offset in value buffer.
+     * @param vlen
+     *          length of value.
+     * @throws IOException
+     *           Upon IO errors.
+     *           &lt;p&gt;
+     *           If an exception is thrown, the TFile will be in an inconsistent
+     *           state. The only legitimate call after that would be close
+     */
+    public void append(byte[] key, int koff, int klen, byte[] value, int voff,
+        int vlen) throws IOException {
+      if ((koff | klen | (koff + klen) | (key.length - (koff + klen))) &lt; 0) {
+        throw new IndexOutOfBoundsException(
+            "Bad key buffer offset-length combination.");
+      }
+
+      if ((voff | vlen | (voff + vlen) | (value.length - (voff + vlen))) &lt; 0) {
+        throw new IndexOutOfBoundsException(
+            "Bad value buffer offset-length combination.");
+      }
+
+      try {
+        DataOutputStream dosKey = prepareAppendKey(klen);
+        try {
+          ++errorCount;
+          dosKey.write(key, koff, klen);
+          --errorCount;
+        } finally {
+          dosKey.close();
+        }
+
+        DataOutputStream dosValue = prepareAppendValue(vlen);
+        try {
+          ++errorCount;
+          dosValue.write(value, voff, vlen);
+          --errorCount;
+        } finally {
+          dosValue.close();
+        }
+      } finally {
+        state = State.READY;
+      }
+    }
+
+    /**
+     * Helper class to register key after close call on key append stream.
+     */
+    private class KeyRegister extends DataOutputStream {
+      private final int expectedLength;
+      private boolean closed = false;
+
+      public KeyRegister(int len) {
+        super(currentKeyBufferOS);
+        if (len &gt;= 0) {
+          currentKeyBufferOS.reset(len);
+        } else {
+          currentKeyBufferOS.reset();
+        }
+        expectedLength = len;
+      }
+
+      @Override
+      public void close() throws IOException {
+        if (closed == true) {
+          return;
+        }
+
+        try {
+          ++errorCount;
+          byte[] key = currentKeyBufferOS.getBuffer();
+          int len = currentKeyBufferOS.size();
+          /**
+           * verify length.
+           */
+          if (expectedLength &gt;= 0 &amp;&amp; expectedLength != len) {
+            throw new IOException("Incorrect key length: expected="
+                + expectedLength + " actual=" + len);
+          }
+
+          Utils.writeVInt(blkAppender, len);
+          blkAppender.write(key, 0, len);
+          if (tfileIndex.getFirstKey() == null) {
+            tfileIndex.setFirstKey(key, 0, len);
+          }
+
+          if (tfileMeta.isSorted()) {
+            byte[] lastKey = lastKeyBufferOS.getBuffer();
+            int lastLen = lastKeyBufferOS.size();
+            if (tfileMeta.getComparator().compare(key, 0, len, lastKey, 0,
+                lastLen) &lt; 0) {
+              throw new IOException("Keys are not added in sorted order");
+            }
+          }
+
+          BoundedByteArrayOutputStream tmp = currentKeyBufferOS;
+          currentKeyBufferOS = lastKeyBufferOS;
+          lastKeyBufferOS = tmp;
+          --errorCount;
+        } finally {
+          closed = true;
+          state = State.END_KEY;
+        }
+      }
+    }
+
+    /**
+     * Helper class to register value after close call on value append stream.
+     */
+    private class ValueRegister extends DataOutputStream {
+      private boolean closed = false;
+
+      public ValueRegister(OutputStream os) {
+        super(os);
+      }
+
+      // Avoiding flushing call to down stream.
+      @Override
+      public void flush() {
+        // do nothing
+      }
+
+      @Override
+      public void close() throws IOException {
+        if (closed == true) {
+          return;
+        }
+
+        try {
+          ++errorCount;
+          super.close();
+          blkRecordCount++;
+          // bump up the total record count in the whole file
+          tfileMeta.incRecordCount();
+          finishDataBlock(false);
+          --errorCount;
+        } finally {
+          closed = true;
+          state = State.READY;
+        }
+      }
+    }
+
+    /**
+     * Obtain an output stream for writing a key into TFile. This may only be
+     * called when there is no active Key appending stream or value appending
+     * stream.
+     * 
+     * @param length
+     *          The expected length of the key. If length of the key is not
+     *          known, set length = -1. Otherwise, the application must write
+     *          exactly as many bytes as specified here before calling close on
+     *          the returned output stream.
+     * @return The key appending output stream.
+     * @throws IOException
+     * 
+     */
+    public DataOutputStream prepareAppendKey(int length) throws IOException {
+      if (state != State.READY) {
+        throw new IllegalStateException("Incorrect state to start a new key: "
+            + state.name());
+      }
+
+      initDataBlock();
+      DataOutputStream ret = new KeyRegister(length);
+      state = State.IN_KEY;
+      return ret;
+    }
+
+    /**
+     * Obtain an output stream for writing a value into TFile. This may only be
+     * called right after a key appending operation (the key append stream must
+     * be closed).
+     * 
+     * @param length
+     *          The expected length of the value. If length of the value is not
+     *          known, set length = -1. Otherwise, the application must write
+     *          exactly as many bytes as specified here before calling close on
+     *          the returned output stream. Advertising the value size up-front
+     *          guarantees that the value is encoded in one chunk, and avoids
+     *          intermediate chunk buffering.
+     * @throws IOException
+     * 
+     */
+    public DataOutputStream prepareAppendValue(int length) throws IOException {
+      if (state != State.END_KEY) {
+        throw new IllegalStateException(
+            "Incorrect state to start a new value: " + state.name());
+      }
+
+      DataOutputStream ret;
+
+      // unknown length
+      if (length &lt; 0) {
+        if (valueBuffer == null) {
+          valueBuffer = new byte[getChunkBufferSize(conf)];
+        }
+        ret = new ValueRegister(new ChunkEncoder(blkAppender, valueBuffer));
+      } else {
+        ret =
+            new ValueRegister(new Chunk.SingleChunkEncoder(blkAppender, length));
+      }
+
+      state = State.IN_VALUE;
+      return ret;
+    }
+
+    /**
+     * Obtain an output stream for creating a meta block. This function may not
+     * be called when there is a key append stream or value append stream
+     * active. No more key-value insertion is allowed after a meta data block
+     * has been added to TFile.
+     * 
+     * @param name
+     *          Name of the meta block.
+     * @param compressName
+     *          Name of the compression algorithm to be used. Must be one of the
+     *          strings returned by
+     *          {@link TFile#getSupportedCompressionAlgorithms()}.
+     * @return A DataOutputStream that can be used to write Meta Block data.
+     *         Closing the stream would signal the ending of the block.
+     * @throws IOException
+     * @throws MetaBlockAlreadyExists
+     *           the Meta Block with the same name already exists.
+     */
+    public DataOutputStream prepareMetaBlock(String name, String compressName)
+        throws IOException, MetaBlockAlreadyExists {
+      if (state != State.READY) {
+        throw new IllegalStateException(
+            "Incorrect state to start a Meta Block: " + state.name());
+      }
+
+      finishDataBlock(true);
+      DataOutputStream outputStream =
+          writerBCF.prepareMetaBlock(name, compressName);
+      return outputStream;
+    }
+
+    /**
+     * Obtain an output stream for creating a meta block. This function may not
+     * be called when there is a key append stream or value append stream
+     * active. No more key-value insertion is allowed after a meta data block
+     * has been added to TFile. Data will be compressed using the default
+     * compressor as defined in Writer's constructor.
+     * 
+     * @param name
+     *          Name of the meta block.
+     * @return A DataOutputStream that can be used to write Meta Block data.
+     *         Closing the stream would signal the ending of the block.
+     * @throws IOException
+     * @throws MetaBlockAlreadyExists
+     *           the Meta Block with the same name already exists.
+     */
+    public DataOutputStream prepareMetaBlock(String name) throws IOException,
+        MetaBlockAlreadyExists {
+      if (state != State.READY) {
+        throw new IllegalStateException(
+            "Incorrect state to start a Meta Block: " + state.name());
+      }
+
+      finishDataBlock(true);
+      return writerBCF.prepareMetaBlock(name);
+    }
+
+    /**
+     * Check if we need to start a new data block.
+     * 
+     * @throws IOException
+     */
+    private void initDataBlock() throws IOException {
+      // for each new block, get a new appender
+      if (blkAppender == null) {
+        blkAppender = writerBCF.prepareDataBlock();
+      }
+    }
+
+    /**
+     * Close the current data block if necessary.
+     * 
+     * @param bForceFinish
+     *          Force the closure regardless of the block size.
+     * @throws IOException
+     */
+    void finishDataBlock(boolean bForceFinish) throws IOException {
+      if (blkAppender == null) {
+        return;
+      }
+
+      // exceeded the size limit, do the compression and finish the block
+      if (bForceFinish || blkAppender.getCompressedSize() &gt;= sizeMinBlock) {
+        // keep tracks of the last key of each data block, no padding
+        // for now
+        TFileIndexEntry keyLast =
+            new TFileIndexEntry(lastKeyBufferOS.getBuffer(), 0, lastKeyBufferOS
+                .size(), blkRecordCount);
+        tfileIndex.addEntry(keyLast);
+        // close the appender
+        blkAppender.close();
+        blkAppender = null;
+        blkRecordCount = 0;
+      }
+    }
+  }
+
+  /**
+   * TFile Reader. Users may only read TFiles by creating TFile.Reader.Scanner.
+   * objects. A scanner may scan the whole TFile ({@link Reader#createScanner()}
+   * ) , a portion of TFile based on byte offsets (
+   * {@link Reader#createScanner(long, long)}), or a portion of TFile with keys
+   * fall in a certain key range (for sorted TFile only,
+   * {@link Reader#createScanner(byte[], byte[])} or
+   * {@link Reader#createScanner(RawComparable, RawComparable)}).
+   */
+  public static class Reader implements Closeable {
+    // The underlying BCFile reader.
+    final BCFile.Reader readerBCF;
+
+    // TFile index, it is loaded lazily.
+    TFileIndex tfileIndex = null;
+    final TFileMeta tfileMeta;
+    final BytesComparator comparator;
+
+    // global begin and end locations.
+    private final Location begin;
+    private final Location end;
+
+    /**
+     * Location representing a virtual position in the TFile.
+     */
+    static final class Location implements Comparable&lt;Location&gt;, Cloneable {
+      private int blockIndex;
+      // distance/offset from the beginning of the block
+      private long recordIndex;
+
+      Location(int blockIndex, long recordIndex) {
+        set(blockIndex, recordIndex);
+      }
+
+      void incRecordIndex() {
+        ++recordIndex;
+      }
+
+      Location(Location other) {
+        set(other);
+      }
+
+      int getBlockIndex() {
+        return blockIndex;
+      }
+
+      long getRecordIndex() {
+        return recordIndex;
+      }
+
+      void set(int blockIndex, long recordIndex) {
+        if ((blockIndex | recordIndex) &lt; 0) {
+          throw new IllegalArgumentException(
+              "Illegal parameter for BlockLocation.");
+        }
+        this.blockIndex = blockIndex;
+        this.recordIndex = recordIndex;
+      }
+
+      void set(Location other) {
+        set(other.blockIndex, other.recordIndex);
+      }
+
+      /**
+       * @see java.lang.Comparable#compareTo(java.lang.Object)
+       */
+      @Override
+      public int compareTo(Location other) {
+        return compareTo(other.blockIndex, other.recordIndex);
+      }
+
+      int compareTo(int bid, long rid) {
+        if (this.blockIndex == bid) {
+          long ret = this.recordIndex - rid;
+          if (ret &gt; 0) return 1;
+          if (ret &lt; 0) return -1;
+          return 0;
+        }
+        return this.blockIndex - bid;
+      }
+
+      /**
+       * @see java.lang.Object#clone()
+       */
+      @Override
+      protected Location clone() {
+        return new Location(blockIndex, recordIndex);
+      }
+
+      /**
+       * @see java.lang.Object#hashCode()
+       */
+      @Override
+      public int hashCode() {
+        final int prime = 31;
+        int result = prime + blockIndex;
+        result = (int) (prime * result + recordIndex);
+        return result;
+      }
+
+      /**
+       * @see java.lang.Object#equals(java.lang.Object)
+       */
+      @Override
+      public boolean equals(Object obj) {
+        if (this == obj) return true;
+        if (obj == null) return false;
+        if (getClass() != obj.getClass()) return false;
+        Location other = (Location) obj;
+        if (blockIndex != other.blockIndex) return false;
+        if (recordIndex != other.recordIndex) return false;
+        return true;
+      }
+    }
+
+    /**
+     * Constructor
+     * 
+     * @param fsdis
+     *          FS input stream of the TFile.
+     * @param fileLength
+     *          The length of TFile. This is required because we have no easy
+     *          way of knowing the actual size of the input file through the
+     *          File input stream.
+     * @param conf
+     * @throws IOException
+     */
+    public Reader(FSDataInputStream fsdis, long fileLength, Configuration conf)
+        throws IOException {
+      readerBCF = new BCFile.Reader(fsdis, fileLength, conf);
+
+      // first, read TFile meta
+      BlockReader brMeta = readerBCF.getMetaBlock(TFileMeta.BLOCK_NAME);
+      try {
+        tfileMeta = new TFileMeta(brMeta);
+      } finally {
+        brMeta.close();
+      }
+
+      comparator = tfileMeta.getComparator();
+      // Set begin and end locations.
+      begin = new Location(0, 0);
+      end = new Location(readerBCF.getBlockCount(), 0);
+    }
+
+    /**
+     * Close the reader. The state of the Reader object is undefined after
+     * close. Calling close() for multiple times has no effect.
+     */
+    public void close() throws IOException {
+      readerBCF.close();
+    }
+
+    /**
+     * Get the begin location of the TFile.
+     * 
+     * @return If TFile is not empty, the location of the first key-value pair.
+     *         Otherwise, it returns end().
+     */
+    Location begin() {
+      return begin;
+    }
+
+    /**
+     * Get the end location of the TFile.
+     * 
+     * @return The location right after the last key-value pair in TFile.
+     */
+    Location end() {
+      return end;
+    }
+
+    /**
+     * Get the string representation of the comparator.
+     * 
+     * @return If the TFile is not sorted by keys, an empty string will be
+     *         returned. Otherwise, the actual comparator string that is
+     *         provided during the TFile creation time will be returned.
+     */
+    public String getComparatorName() {
+      return tfileMeta.getComparatorString();
+    }
+
+    /**
+     * Is the TFile sorted?
+     * 
+     * @return true if TFile is sorted.
+     */
+    public boolean isSorted() {
+      return tfileMeta.isSorted();
+    }
+
+    /**
+     * Get the number of key-value pair entries in TFile.
+     * 
+     * @return the number of key-value pairs in TFile
+     */
+    public long getEntryCount() {
+      return tfileMeta.getRecordCount();
+    }
+
+    /**
+     * Lazily loading the TFile index.
+     * 
+     * @throws IOException
+     */
+    synchronized void checkTFileDataIndex() throws IOException {
+      if (tfileIndex == null) {
+        BlockReader brIndex = readerBCF.getMetaBlock(TFileIndex.BLOCK_NAME);
+        try {
+          tfileIndex =
+              new TFileIndex(readerBCF.getBlockCount(), brIndex, tfileMeta
+                  .getComparator());
+        } finally {
+          brIndex.close();
+        }
+      }
+    }
+
+    /**
+     * Get the first key in the TFile.
+     * 
+     * @return The first key in the TFile.
+     * @throws IOException
+     */
+    public RawComparable getFirstKey() throws IOException {
+      checkTFileDataIndex();
+      return tfileIndex.getFirstKey();
+    }
+
+    /**
+     * Get the last key in the TFile.
+     * 
+     * @return The last key in the TFile.
+     * @throws IOException
+     */
+    public RawComparable getLastKey() throws IOException {
+      checkTFileDataIndex();
+      return tfileIndex.getLastKey();
+    }
+
+    /**
+     * Get a Comparator object to compare Entries. It is useful when you want
+     * stores the entries in a collection (such as PriorityQueue) and perform
+     * sorting or comparison among entries based on the keys without copying out
+     * the key.
+     * 
+     * @return An Entry Comparator..
+     */
+    public Comparator&lt;Scanner.Entry&gt; getEntryComparator() {
+      if (!isSorted()) {
+        throw new RuntimeException(
+            "Entries are not comparable for unsorted TFiles");
+      }
+
+      return new Comparator&lt;Scanner.Entry&gt;() {
+        /**
+         * Provide a customized comparator for Entries. This is useful if we
+         * have a collection of Entry objects. However, if the Entry objects
+         * come from different TFiles, users must ensure that those TFiles share
+         * the same RawComparator.
+         */
+        @Override
+        public int compare(Scanner.Entry o1, Scanner.Entry o2) {
+          return comparator.compare(o1.getKeyBuffer(), 0, o1.getKeyLength(), o2
+              .getKeyBuffer(), 0, o2.getKeyLength());
+        }
+      };
+    }
+
+    /**
+     * Get an instance of the RawComparator that is constructed based on the
+     * string comparator representation.
+     * 
+     * @return a Comparator that can compare RawComparable's.
+     */
+    public Comparator&lt;RawComparable&gt; getComparator() {
+      return comparator;
+    }
+
+    /**
+     * Stream access to a meta block.``
+     * 
+     * @param name
+     *          The name of the meta block.
+     * @return The input stream.
+     * @throws IOException
+     *           on I/O error.
+     * @throws MetaBlockDoesNotExist
+     *           If the meta block with the name does not exist.
+     */
+    public DataInputStream getMetaBlock(String name) throws IOException,
+        MetaBlockDoesNotExist {
+      return readerBCF.getMetaBlock(name);
+    }
+
+    /**
+     * if greater is true then returns the beginning location of the block
+     * containing the key strictly greater than input key. if greater is false
+     * then returns the beginning location of the block greater than equal to
+     * the input key
+     * 
+     * @param key
+     *          the input key
+     * @param greater
+     *          boolean flag
+     * @return
+     * @throws IOException
+     */
+    Location getBlockContainsKey(RawComparable key, boolean greater)
+        throws IOException {
+      if (!isSorted()) {
+        throw new RuntimeException("Seeking in unsorted TFile");
+      }
+      checkTFileDataIndex();
+      int blkIndex =
+          (greater) ? tfileIndex.upperBound(key) : tfileIndex.lowerBound(key);
+      if (blkIndex &lt; 0) return end;
+      return new Location(blkIndex, 0);
+    }
+
+    int compareKeys(byte[] a, int o1, int l1, byte[] b, int o2, int l2) {
+      if (!isSorted()) {
+        throw new RuntimeException("Cannot compare keys for unsorted TFiles.");
+      }
+      return comparator.compare(a, o1, l1, b, o2, l2);
+    }
+
+    int compareKeys(RawComparable a, RawComparable b) {
+      if (!isSorted()) {
+        throw new RuntimeException("Cannot compare keys for unsorted TFiles.");
+      }
+      return comparator.compare(a, b);
+    }
+
+    /**
+     * Get the location pointing to the beginning of the first key-value pair in
+     * a compressed block whose byte offset in the TFile is greater than or
+     * equal to the specified offset.
+     * 
+     * @param offset
+     *          the user supplied offset.
+     * @return the location to the corresponding entry; or end() if no such
+     *         entry exists.
+     */
+    Location getLocationNear(long offset) {
+      int blockIndex = readerBCF.getBlockIndexNear(offset);
+      if (blockIndex == -1) return end;
+      return new Location(blockIndex, 0);
+    }
+
+    /**
+     * Get a sample key that is within a block whose starting offset is greater
+     * than or equal to the specified offset.
+     * 
+     * @param offset
+     *          The file offset.
+     * @return the key that fits the requirement; or null if no such key exists
+     *         (which could happen if the offset is close to the end of the
+     *         TFile).
+     * @throws IOException
+     */
+    public RawComparable getKeyNear(long offset) throws IOException {
+      int blockIndex = readerBCF.getBlockIndexNear(offset);
+      if (blockIndex == -1) return null;
+      checkTFileDataIndex();
+      return new ByteArray(tfileIndex.getEntry(blockIndex).key);
+    }
+
+    /**
+     * Get a scanner than can scan the whole TFile.
+     * 
+     * @return The scanner object. A valid Scanner is always returned even if
+     *         the TFile is empty.
+     * @throws IOException
+     */
+    public Scanner createScanner() throws IOException {
+      return new Scanner(this, begin, end);
+    }
+
+    /**
+     * Get a scanner that covers a portion of TFile based on byte offsets.
+     * 
+     * @param offset
+     *          The beginning byte offset in the TFile.
+     * @param length
+     *          The length of the region.
+     * @return The actual coverage of the returned scanner tries to match the
+     *         specified byte-region but always round up to the compression
+     *         block boundaries. It is possible that the returned scanner
+     *         contains zero key-value pairs even if length is positive.
+     * @throws IOException
+     */
+    public Scanner createScanner(long offset, long length) throws IOException {
+      return new Scanner(this, offset, offset + length);
+    }
+
+    /**
+     * Get a scanner that covers a portion of TFile based on keys.
+     * 
+     * @param beginKey
+     *          Begin key of the scan (inclusive). If null, scan from the first
+     *          key-value entry of the TFile.
+     * @param endKey
+     *          End key of the scan (exclusive). If null, scan up to the last
+     *          key-value entry of the TFile.
+     * @return The actual coverage of the returned scanner will cover all keys
+     *         greater than or equal to the beginKey and less than the endKey.
+     * @throws IOException
+     */
+    public Scanner createScanner(byte[] beginKey, byte[] endKey)
+        throws IOException {
+      return createScanner((beginKey == null) ? null : new ByteArray(beginKey,
+          0, beginKey.length), (endKey == null) ? null : new ByteArray(endKey,
+          0, endKey.length));
+    }
+
+    /**
+     * Get a scanner that covers a specific key range.
+     * 
+     * @param beginKey
+     *          Begin key of the scan (inclusive). If null, scan from the first
+     *          key-value entry of the TFile.
+     * @param endKey
+     *          End key of the scan (exclusive). If null, scan up to the last
+     *          key-value entry of the TFile.
+     * @return The actual coverage of the returned scanner will cover all keys
+     *         greater than or equal to the beginKey and less than the endKey.
+     * @throws IOException
+     */
+    public Scanner createScanner(RawComparable beginKey, RawComparable endKey)
+        throws IOException {
+      if ((beginKey != null) &amp;&amp; (endKey != null)
+          &amp;&amp; (compareKeys(beginKey, endKey) &gt;= 0)) {
+        return new Scanner(this, beginKey, beginKey);
+      }
+      return new Scanner(this, beginKey, endKey);
+    }
+
+    /**
+     * The TFile Scanner. The Scanner has an implicit cursor, which, upon
+     * creation, points to the first key-value pair in the scan range. If the
+     * scan range is empty, the cursor will point to the end of the scan range.
+     * &lt;p&gt;
+     * Use {@link Scanner#atEnd()} to test whether the cursor is at the end
+     * location of the scanner.
+     * &lt;p&gt;
+     * Use {@link Scanner#advance()} to move the cursor to the next key-value
+     * pair (or end if none exists). Use seekTo methods (
+     * {@link Scanner#seekTo(byte[])} or
+     * {@link Scanner#seekTo(byte[], int, int)}) to seek to any arbitrary
+     * location in the covered range (including backward seeking). Use
+     * {@link Scanner#rewind()} to seek back to the beginning of the scanner.
+     * Use {@link Scanner#seekToEnd()} to seek to the end of the scanner.
+     * &lt;p&gt;
+     * Actual keys and values may be obtained through {@link Scanner.Entry}
+     * object, which is obtained through {@link Scanner#entry()}.
+     */
+    public static class Scanner implements Closeable {
+      // The underlying TFile reader.
+      final Reader reader;
+      // current block (null if reaching end)
+      private BlockReader blkReader;
+
+      Location beginLocation;
+      Location endLocation;
+      Location currentLocation;
+
+      // flag to ensure value is only examined once.
+      boolean valueChecked = false;
+      // reusable buffer for keys.
+      final byte[] keyBuffer;
+      // length of key, -1 means key is invalid.
+      int klen = -1;
+
+      static final int MAX_VAL_TRANSFER_BUF_SIZE = 128 * 1024;
+      BytesWritable valTransferBuffer;
+
+      DataInputBuffer keyDataInputStream;
+      ChunkDecoder valueBufferInputStream;
+      DataInputStream valueDataInputStream;
+      // vlen == -1 if unknown.
+      int vlen;
+
+      /**
+       * Constructor
+       * 
+       * @param reader
+       *          The TFile reader object.
+       * @param offBegin
+       *          Begin byte-offset of the scan.
+       * @param offEnd
+       *          End byte-offset of the scan.
+       * @throws IOException
+       * 
+       *           The offsets will be rounded to the beginning of a compressed
+       *           block whose offset is greater than or equal to the specified
+       *           offset.
+       */
+      protected Scanner(Reader reader, long offBegin, long offEnd)
+          throws IOException {
+        this(reader, reader.getLocationNear(offBegin), reader
+            .getLocationNear(offEnd));
+      }
+
+      /**
+       * Constructor
+       * 
+       * @param reader
+       *          The TFile reader object.
+       * @param begin
+       *          Begin location of the scan.
+       * @param end
+       *          End location of the scan.
+       * @throws IOException
+       */
+      Scanner(Reader reader, Location begin, Location end) throws IOException {
+        this.reader = reader;
+        // ensure the TFile index is loaded throughout the life of scanner.
+        reader.checkTFileDataIndex();
+        beginLocation = begin;
+        endLocation = end;
+
+        valTransferBuffer = new BytesWritable();
+        // TODO: remember the longest key in a TFile, and use it to replace
+        // MAX_KEY_SIZE.
+        keyBuffer = new byte[MAX_KEY_SIZE];
+        keyDataInputStream = new DataInputBuffer();
+        valueBufferInputStream = new ChunkDecoder();
+        valueDataInputStream = new DataInputStream(valueBufferInputStream);
+
+        if (beginLocation.compareTo(endLocation) &gt;= 0) {
+          currentLocation = new Location(endLocation);
+        } else {
+          currentLocation = new Location(0, 0);
+          initBlock(beginLocation.getBlockIndex());
+          inBlockAdvance(beginLocation.getRecordIndex());
+        }
+      }
+
+      /**
+       * Constructor
+       * 
+       * @param reader
+       *          The TFile reader object.
+       * @param beginKey
+       *          Begin key of the scan. If null, scan from the first &lt;K,V&gt;
+       *          entry of the TFile.
+       * @param endKey
+       *          End key of the scan. If null, scan up to the last &lt;K, V&gt; entry
+       *          of the TFile.
+       * @throws IOException
+       */
+      protected Scanner(Reader reader, RawComparable beginKey,
+          RawComparable endKey) throws IOException {
+        this(reader, (beginKey == null) ? reader.begin() : reader
+            .getBlockContainsKey(beginKey, false), reader.end());
+        if (beginKey != null) {
+          inBlockAdvance(beginKey, false);
+          beginLocation.set(currentLocation);
+        }
+        if (endKey != null) {
+          seekTo(endKey, false);
+          endLocation.set(currentLocation);
+          seekTo(beginLocation);
+        }
+      }
+
+      /**
+       * Move the cursor to the first entry whose key is greater than or equal
+       * to the input key. Synonymous to seekTo(key, 0, key.length). The entry
+       * returned by the previous entry() call will be invalid.
+       * 
+       * @param key
+       *          The input key
+       * @return true if we find an equal key.
+       * @throws IOException
+       */
+      public boolean seekTo(byte[] key) throws IOException {
+        return seekTo(key, 0, key.length);
+      }
+
+      /**
+       * Move the cursor to the first entry whose key is greater than or equal
+       * to the input key. The entry returned by the previous entry() call will
+       * be invalid.
+       * 
+       * @param key
+       *          The input key
+       * @param keyOffset
+       *          offset in the key buffer.
+       * @param keyLen
+       *          key buffer length.
+       * @return true if we find an equal key; false otherwise.
+       * @throws IOException
+       */
+      public boolean seekTo(byte[] key, int keyOffset, int keyLen)
+          throws IOException {
+        return seekTo(new ByteArray(key, keyOffset, keyLen), false);
+      }
+
+      private boolean seekTo(RawComparable key, boolean beyond)
+          throws IOException {
+        Location l = reader.getBlockContainsKey(key, beyond);
+        if (l.compareTo(beginLocation) &lt; 0) {
+          l = beginLocation;
+        } else if (l.compareTo(endLocation) &gt;= 0) {
+          seekTo(endLocation);
+          return false;
+        }
+
+        // check if what we are seeking is in the later part of the current
+        // block.
+        if (atEnd() || (l.getBlockIndex() != currentLocation.getBlockIndex())
+            || (compareCursorKeyTo(key) &gt;= 0)) {
+          // sorry, we must seek to a different location first.
+          seekTo(l);
+        }
+
+        return inBlockAdvance(key, beyond);
+      }
+
+      /**
+       * Move the cursor to the new location. The entry returned by the previous
+       * entry() call will be invalid.
+       * 
+       * @param l
+       *          new cursor location. It must fall between the begin and end
+       *          location of the scanner.
+       * @throws IOException
+       */
+      private void seekTo(Location l) throws IOException {
+        if (l.compareTo(beginLocation) &lt; 0) {
+          throw new IllegalArgumentException(
+              "Attempt to seek before the begin location.");
+        }
+
+        if (l.compareTo(endLocation) &gt; 0) {
+          throw new IllegalArgumentException(
+              "Attempt to seek after the end location.");
+        }
+
+        if (l.compareTo(endLocation) == 0) {
+          parkCursorAtEnd();
+          return;
+        }
+
+        if (l.getBlockIndex() != currentLocation.getBlockIndex()) {
+          // going to a totally different block
+          initBlock(l.getBlockIndex());
+        } else {
+          if (valueChecked) {
+            // may temporarily go beyond the last record in the block (in which
+            // case the next if loop will always be true).
+            inBlockAdvance(1);
+          }
+          if (l.getRecordIndex() &lt; currentLocation.getRecordIndex()) {
+            initBlock(l.getBlockIndex());
+          }
+        }
+
+        inBlockAdvance(l.getRecordIndex() - currentLocation.getRecordIndex());
+
+        return;
+      }
+
+      /**
+       * Rewind to the first entry in the scanner. The entry returned by the
+       * previous entry() call will be invalid.
+       * 
+       * @throws IOException
+       */
+      public void rewind() throws IOException {
+        seekTo(beginLocation);
+      }
+
+      /**
+       * Seek to the end of the scanner. The entry returned by the previous
+       * entry() call will be invalid.
+       * 
+       * @throws IOException
+       */
+      public void seekToEnd() throws IOException {
+        parkCursorAtEnd();
+      }
+
+      /**
+       * Move the cursor to the first entry whose key is greater than or equal
+       * to the input key. Synonymous to lowerBound(key, 0, key.length). The
+       * entry returned by the previous entry() call will be invalid.
+       * 
+       * @param key
+       *          The input key
+       * @throws IOException
+       */
+      public void lowerBound(byte[] key) throws IOException {
+        lowerBound(key, 0, key.length);
+      }
+
+      /**
+       * Move the cursor to the first entry whose key is greater than or equal
+       * to the input key. The entry returned by the previous entry() call will
+       * be invalid.
+       * 
+       * @param key
+       *          The input key
+       * @param keyOffset
+       *          offset in the key buffer.
+       * @param keyLen
+       *          key buffer length.
+       * @throws IOException
+       */
+      public void lowerBound(byte[] key, int keyOffset, int keyLen)
+          throws IOException {
+        seekTo(new ByteArray(key, keyOffset, keyLen), false);
+      }
+
+      /**
+       * Move the cursor to the first entry whose key is strictly greater than
+       * the input key. Synonymous to upperBound(key, 0, key.length). The entry
+       * returned by the previous entry() call will be invalid.
+       * 
+       * @param key
+       *          The input key
+       * @throws IOException
+       */
+      public void upperBound(byte[] key) throws IOException {
+        upperBound(key, 0, key.length);
+      }
+
+      /**
+       * Move the cursor to the first entry whose key is strictly greater than
+       * the input key. The entry returned by the previous entry() call will be
+       * invalid.
+       * 
+       * @param key
+       *          The input key
+       * @param keyOffset
+       *          offset in the key buffer.
+       * @param keyLen
+       *          key buffer length.
+       * @throws IOException
+       */
+      public void upperBound(byte[] key, int keyOffset, int keyLen)
+          throws IOException {
+        seekTo(new ByteArray(key, keyOffset, keyLen), true);
+      }
+
+      /**
+       * Move the cursor to the next key-value pair. The entry returned by the
+       * previous entry() call will be invalid.
+       * 
+       * @return true if the cursor successfully moves. False when cursor is
+       *         already at the end location and cannot be advanced.
+       * @throws IOException
+       */
+      public boolean advance() throws IOException {
+        if (atEnd()) {
+          return false;
+        }
+
+        int curBid = currentLocation.getBlockIndex();
+        long curRid = currentLocation.getRecordIndex();
+        long entriesInBlock = reader.getBlockEntryCount(curBid);
+        if (curRid + 1 &gt;= entriesInBlock) {
+          if (endLocation.compareTo(curBid + 1, 0) &lt;= 0) {
+            // last entry in TFile.
+            parkCursorAtEnd();
+          } else {
+            // last entry in Block.
+            initBlock(curBid + 1);
+          }
+        } else {
+          inBlockAdvance(1);
+        }
+        return true;
+      }
+
+      /**
+       * Load a compressed block for reading. Expecting blockIndex is valid.
+       * 
+       * @throws IOException
+       */
+      private void initBlock(int blockIndex) throws IOException {
+        klen = -1;
+        if (blkReader != null) {
+          try {
+            blkReader.close();
+          } finally {
+            blkReader = null;
+          }
+        }
+        blkReader = reader.getBlockReader(blockIndex);
+        currentLocation.set(blockIndex, 0);
+      }
+
+      private void parkCursorAtEnd() throws IOException {
+        klen = -1;
+        currentLocation.set(endLocation);
+        if (blkReader != null) {
+          try {
+            blkReader.close();
+          } finally {
+            blkReader = null;
+          }
+        }
+      }
+
+      /**
+       * Close the scanner. Release all resources. The behavior of using the
+       * scanner after calling close is not defined. The entry returned by the
+       * previous entry() call will be invalid.
+       */
+      public void close() throws IOException {
+        parkCursorAtEnd();
+      }
+
+      /**
+       * Is cursor at the end location?
+       * 
+       * @return true if the cursor is at the end location.
+       */
+      public boolean atEnd() {
+        return (currentLocation.compareTo(endLocation) &gt;= 0);
+      }
+
+      /**
+       * check whether we have already successfully obtained the key. It also
+       * initializes the valueInputStream.
+       */
+      void checkKey() throws IOException {
+        if (klen &gt;= 0) return;
+        if (atEnd()) {
+          throw new EOFException("No key-value to read");
+        }
+        klen = -1;
+        vlen = -1;
+        valueChecked = false;
+
+        klen = Utils.readVInt(blkReader);
+        blkReader.readFully(keyBuffer, 0, klen);
+        valueBufferInputStream.reset(blkReader);
+        if (valueBufferInputStream.isLastChunk()) {
+          vlen = valueBufferInputStream.getRemain();
+        }
+      }
+
+      /**
+       * Get an entry to access the key and value.
+       * 
+       * @return The Entry object to access the key and value.
+       * @throws IOException
+       */
+      public Entry entry() throws IOException {
+        checkKey();
+        return new Entry();
+      }
+
+      /**
+       * Internal API. Comparing the key at cursor to user-specified key.
+       * 
+       * @param other
+       *          user-specified key.
+       * @return negative if key at cursor is smaller than user key; 0 if equal;
+       *         and positive if key at cursor greater than user key.
+       * @throws IOException
+       */
+      int compareCursorKeyTo(RawComparable other) throws IOException {
+        checkKey();
+        return reader.compareKeys(keyBuffer, 0, klen, other.buffer(), other
+            .offset(), other.size());
+      }
+
+      /**
+       * Entry to a &amp;lt;Key, Value&amp;gt; pair.
+       */
+      public class Entry implements Comparable&lt;RawComparable&gt; {
+        /**
+         * Get the length of the key.
+         * 
+         * @return the length of the key.
+         */
+        public int getKeyLength() {
+          return klen;
+        }
+
+        byte[] getKeyBuffer() {
+          return keyBuffer;
+        }
+
+        /**
+         * Copy the key and value in one shot into BytesWritables. This is
+         * equivalent to getKey(key); getValue(value);
+         * 
+         * @param key
+         *          BytesWritable to hold key.
+         * @param value
+         *          BytesWritable to hold value
+         * @throws IOException
+         */
+        public void get(BytesWritable key, BytesWritable value)
+            throws IOException {
+          getKey(key);
+          getValue(value);
+        }
+
+        /**
+         * Copy the key into BytesWritable. The input BytesWritable will be
+         * automatically resized to the actual key size.
+         * 
+         * @param key
+         *          BytesWritable to hold the key.
+         * @throws IOException
+         */
+        public int getKey(BytesWritable key) throws IOException {
+          key.setSize(getKeyLength());
+          getKey(key.get());
+          return key.getSize();
+        }
+
+        /**
+         * Copy the value into BytesWritable. The input BytesWritable will be
+         * automatically resized to the actual value size. The implementation
+         * directly uses the buffer inside BytesWritable for storing the value.
+         * The call does not require the value length to be known.
+         * 
+         * @param value
+         * @throws IOException
+         */
+        public long getValue(BytesWritable value) throws IOException {
+          DataInputStream dis = getValueStream();
+          int size = 0;
+          try {
+            int remain;
+            while ((remain = valueBufferInputStream.getRemain()) &gt; 0) {
+              value.setSize(size + remain);
+              dis.readFully(value.get(), size, remain);
+              size += remain;
+            }
+            return value.getSize();
+          } finally {
+            dis.close();
+          }
+        }
+
+        /**
+         * Writing the key to the output stream. This method avoids copying key
+         * buffer from Scanner into user buffer, then writing to the output
+         * stream.
+         * 
+         * @param out
+         *          The output stream
+         * @return the length of the key.
+         * @throws IOException
+         */
+        public int writeKey(OutputStream out) throws IOException {
+          out.write(keyBuffer, 0, klen);
+          return klen;
+        }
+
+        /**
+         * Writing the value to the output stream. This method avoids copying
+         * value data from Scanner into user buffer, then writing to the output
+         * stream. It does not require the value length to be known.
+         * 
+         * @param out
+         *          The output stream
+         * @return the length of the value
+         * @throws IOException
+         */
+        public long writeValue(OutputStream out) throws IOException {
+          DataInputStream dis = getValueStream();
+          long size = 0;
+          try {
+            int chunkSize;
+            while ((chunkSize = valueBufferInputStream.getRemain()) &gt; 0) {
+              chunkSize = Math.min(chunkSize, MAX_VAL_TRANSFER_BUF_SIZE);
+              valTransferBuffer.setSize(chunkSize);
+              dis.readFully(valTransferBuffer.get(), 0, chunkSize);
+              out.write(valTransferBuffer.get(), 0, chunkSize);
+              size += chunkSize;
+            }
+            return size;
+          } finally {
+            dis.close();
+          }
+        }
+
+        /**
+         * Copy the key into user supplied buffer.
+         * 
+         * @param buf
+         *          The buffer supplied by user. The length of the buffer must
+         *          not be shorter than the key length.
+         * @return The length of the key.
+         * 
+         * @throws IOException
+         */
+        public int getKey(byte[] buf) throws IOException {
+          return getKey(buf, 0);
+        }
+
+        /**
+         * Copy the key into user supplied buffer.
+         * 
+         * @param buf
+         *          The buffer supplied by user.
+         * @param offset
+         *          The starting offset of the user buffer where we should copy
+         *          the key into. Requiring the key-length + offset no greater
+         *          than the buffer length.
+         * @return The length of the key.
+         * @throws IOException
+         */
+        public int getKey(byte[] buf, int offset) throws IOException {
+          if ((offset | (buf.length - offset - klen)) &lt; 0) {
+            throw new IndexOutOfBoundsException(
+                "Bufer not enough to store the key");
+          }
+          System.arraycopy(keyBuffer, 0, buf, offset, klen);
+          return klen;
+        }
+
+        /**
+         * Streaming access to the key. Useful for desrializing the key into
+         * user objects.
+         * 
+         * @return The input stream.
+         */
+        public DataInputStream getKeyStream() {
+          keyDataInputStream.reset(keyBuffer, klen);
+          return keyDataInputStream;
+        }
+
+        /**
+         * Get the length of the value. isValueLengthKnown() must be tested
+         * true.
+         * 
+         * @return the length of the value.
+         */
+        public int getValueLength() {
+          if (vlen &gt;= 0) {
+            return vlen;
+          }
+
+          throw new RuntimeException("Value length unknown.");
+        }
+
+        /**
+         * Copy value into user-supplied buffer. User supplied buffer must be
+         * large enough to hold the whole value. The value part of the key-value
+         * pair pointed by the current cursor is not cached and can only be
+         * examined once. Calling any of the following functions more than once
+         * without moving the cursor will result in exception:
+         * {@link #getValue(byte[])}, {@link #getValue(byte[], int)},
+         * {@link #getValueStream}.
+         * 
+         * @return the length of the value. Does not require
+         *         isValueLengthKnown() to be true.
+         * @throws IOException
+         * 
+         */
+        public int getValue(byte[] buf) throws IOException {
+          return getValue(buf, 0);
+        }
+
+        /**
+         * Copy value into user-supplied buffer. User supplied buffer must be
+         * large enough to hold the whole value (starting from the offset). The
+         * value part of the key-value pair pointed by the current cursor is not
+         * cached and can only be examined once. Calling any of the following
+         * functions more than once without moving the cursor will result in
+         * exception: {@link #getValue(byte[])}, {@link #getValue(byte[], int)},
+         * {@link #getValueStream}.
+         * 
+         * @return the length of the value. Does not require
+         *         isValueLengthKnown() to be true.
+         * @throws IOException
+         */
+        public int getValue(byte[] buf, int offset) throws IOException {
+          DataInputStream dis = getValueStream();
+          try {
+            if (isValueLengthKnown()) {
+              if ((offset | (buf.length - offset - vlen)) &lt; 0) {
+                throw new IndexOutOfBoundsException(
+                    "Buffer too small to hold value");
+              }
+              dis.readFully(buf, offset, vlen);
+              return vlen;
+            }
+
+            int nextOffset = offset;
+            while (nextOffset &lt; buf.length) {
+              int n = dis.read(buf, nextOffset, buf.length - nextOffset);
+              if (n &lt; 0) {
+                break;
+              }
+              nextOffset += n;
+            }
+            if (dis.read() &gt;= 0) {
+              // attempt to read one more byte to determine whether we reached
+              // the
+              // end or not.
+              throw new IndexOutOfBoundsException(
+                  "Buffer too small to hold value");
+            }
+            return nextOffset - offset;
+          } finally {
+            dis.close();
+          }
+        }
+
+        /**
+         * Stream access to value. The value part of the key-value pair pointed
+         * by the current cursor is not cached and can only be examined once.
+         * Calling any of the following functions more than once without moving
+         * the cursor will result in exception: {@link #getValue(byte[])},
+         * {@link #getValue(byte[], int)}, {@link #getValueStream}.
+         * 
+         * @return The input stream for reading the value.
+         * @throws IOException
+         */
+        public DataInputStream getValueStream() throws IOException {
+          if (valueChecked == true) {
+            throw new IllegalStateException(
+                "Attempt to examine value multiple times.");
+          }
+          valueChecked = true;
+          return valueDataInputStream;
+        }
+
+        /**
+         * Check whether it is safe to call getValueLength().
+         * 
+         * @return true if value length is known before hand. Values less than
+         *         the chunk size will always have their lengths known before
+         *         hand. Values that are written out as a whole (with advertised
+         *         length up-front) will always have their lengths known in
+         *         read.
+         */
+        public boolean isValueLengthKnown() {
+          return (vlen &gt;= 0);
+        }
+
+        /**
+         * Compare the entry key to another key. Synonymous to compareTo(key, 0,
+         * key.length).
+         * 
+         * @param buf
+         *          The key buffer.
+         * @return comparison result between the entry key with the input key.
+         */
+        public int compareTo(byte[] buf) {
+          return compareTo(buf, 0, buf.length);
+        }
+
+        /**
+         * Compare the entry key to another key. Synonymous to compareTo(new
+         * ByteArray(buf, offset, length)
+         * 
+         * @param buf
+         *          The key buffer
+         * @param offset
+         *          offset into the key buffer.
+         * @param length
+         *          the length of the key.
+         * @return comparison result between the entry key with the input key.
+         */
+        public int compareTo(byte[] buf, int offset, int length) {
+          return compareTo(new ByteArray(buf, offset, length));
+        }
+
+        /**
+         * Compare an entry with a RawComparable object. This is useful when
+         * Entries are stored in a collection, and we want to compare a user
+         * supplied key.
+         */
+        @Override
+        public int compareTo(RawComparable key) {
+          return reader.compareKeys(keyBuffer, 0, getKeyLength(), key.buffer(),
+              key.offset(), key.size());
+        }
+
+        /**
+         * Compare whether this and other points to the same key value.
+         */
+        @Override
+        public boolean equals(Object other) {
+          if (this == other) return true;
+          if (!(other instanceof Entry)) return false;
+          return ((Entry) other).compareTo(keyBuffer, 0, getKeyLength()) == 0;
+        }
+
+        @Override
+        public int hashCode() {
+          return WritableComparator.hashBytes(keyBuffer, 0, getKeyLength());
+        }
+      }
+
+      /**
+       * Advance cursor by n positions within the block.
+       * 
+       * @param n
+       *          Number of key-value pairs to skip in block.
+       * @throws IOException
+       */
+      private void inBlockAdvance(long n) throws IOException {
+        for (long i = 0; i &lt; n; ++i) {
+          checkKey();
+          if (!valueBufferInputStream.isClosed()) {
+            valueBufferInputStream.close();
+          }
+          klen = -1;
+          currentLocation.incRecordIndex();
+        }
+      }
+
+      /**
+       * Advance cursor in block until we find a key that is greater than or
+       * equal to the input key.
+       * 
+       * @param key
+       *          Key to compare.
+       * @param greater
+       *          advance until we find a key greater than the input key.
+       * @return true if we find a equal key.
+       * @throws IOException
+       */
+      private boolean inBlockAdvance(RawComparable key, boolean greater)
+          throws IOException {
+        int curBid = currentLocation.getBlockIndex();
+        long entryInBlock = reader.getBlockEntryCount(curBid);
+        if (curBid == endLocation.getBlockIndex()) {
+          entryInBlock = endLocation.getRecordIndex();
+        }
+
+        while (currentLocation.getRecordIndex() &lt; entryInBlock) {
+          int cmp = compareCursorKeyTo(key);
+          if (cmp &gt; 0) return false;
+          if (cmp == 0 &amp;&amp; !greater) return true;
+          if (!valueBufferInputStream.isClosed()) {
+            valueBufferInputStream.close();
+          }
+          klen = -1;
+          currentLocation.incRecordIndex();
+        }
+
+        throw new RuntimeException("Cannot find matching key in block.");
+      }
+    }
+
+    long getBlockEntryCount(int curBid) {
+      return tfileIndex.getEntry(curBid).entries();
+    }
+
+    BlockReader getBlockReader(int blockIndex) throws IOException {
+      return readerBCF.getDataBlock(blockIndex);
+    }
+  }
+
+  /**
+   * Data structure representing "TFile.meta" meta block.
+   */
+  static final class TFileMeta {
+    final static String BLOCK_NAME = "TFile.meta";
+    final Version version;
+    private long recordCount;
+    private final String strComparator;
+    private final BytesComparator comparator;
+
+    // ctor for writes
+    public TFileMeta(String comparator) {
+      // set fileVersion to API version when we create it.
+      version = TFile.API_VERSION;
+      recordCount = 0;
+      strComparator = (comparator == null) ? "" : comparator;
+      this.comparator = makeComparator(strComparator);
+    }
+
+    // ctor for reads
+    public TFileMeta(DataInput in) throws IOException {
+      version = new Version(in);
+      if (!version.compatibleWith(TFile.API_VERSION)) {
+        throw new RuntimeException("Incompatible TFile fileVersion.");
+      }
+      recordCount = Utils.readVLong(in);
+      strComparator = Utils.readString(in);
+      comparator = makeComparator(strComparator);
+    }
+
+    @SuppressWarnings("unchecked")
+    private static BytesComparator makeComparator(String comparator) {
+      if (comparator.length() == 0) {
+        // unsorted keys
+        return null;
+      }
+      if (comparator.equals(COMPARATOR_MEMCMP)) {
+        // default comparator
+        return new BytesComparator(new MemcmpRawComparator());
+      } else if (comparator.startsWith(COMPARATOR_JCLASS)) {
+        String compClassName =
+            comparator.substring(COMPARATOR_JCLASS.length()).trim();
+        try {
+          Class compClass = Class.forName(compClassName);
+          // use its default ctor to create an instance
+          return new BytesComparator((RawComparator&lt;Object&gt;) compClass
+              .newInstance());
+        } catch (Exception e) {
+          throw new IllegalArgumentException(
+              "Failed to instantiate comparator: " + comparator + "("
+                  + e.toString() + ")");
+        }
+      } else {
+        throw new IllegalArgumentException("Unsupported comparator: "
+            + comparator);
+      }
+    }
+
+    public void write(DataOutput out) throws IOException {
+      TFile.API_VERSION.write(out);
+      Utils.writeVLong(out, recordCount);
+      Utils.writeString(out, strComparator);
+    }
+
+    public long getRecordCount() {
+      return recordCount;
+    }
+
+    public void incRecordCount() {
+      ++recordCount;
+    }
+
+    public boolean isSorted() {
+      return !strComparator.equals("");
+    }
+
+    public String getComparatorString() {
+      return strComparator;
+    }
+
+    public BytesComparator getComparator() {
+      return comparator;
+    }
+
+    public Version getVersion() {
+      return version;
+    }
+  } // END: class MetaTFileMeta
+
+  /**
+   * Data structure representing "TFile.index" meta block.
+   */
+  static class TFileIndex {
+    final static String BLOCK_NAME = "TFile.index";
+    private ByteArray firstKey;
+    private final ArrayList&lt;TFileIndexEntry&gt; index;
+    private final BytesComparator comparator;
+
+    /**
+     * For reading from file.
+     * 
+     * @throws IOException
+     */
+    public TFileIndex(int entryCount, DataInput in, BytesComparator comparator)
+        throws IOException {
+      index = new ArrayList&lt;TFileIndexEntry&gt;(entryCount);
+      int size = Utils.readVInt(in); // size for the first key entry.
+      if (size &gt; 0) {
+        byte[] buffer = new byte[size];
+        in.readFully(buffer);
+        DataInputStream firstKeyInputStream =
+            new DataInputStream(new ByteArrayInputStream(buffer, 0, size));
+
+        int firstKeyLength = Utils.readVInt(firstKeyInputStream);
+        firstKey = new ByteArray(new byte[firstKeyLength]);
+        firstKeyInputStream.readFully(firstKey.buffer());
+
+        for (int i = 0; i &lt; entryCount; i++) {
+          size = Utils.readVInt(in);
+          if (buffer.length &lt; size) {
+            buffer = new byte[size];
+          }
+          in.readFully(buffer, 0, size);
+          TFileIndexEntry idx =
+              new TFileIndexEntry(new DataInputStream(new ByteArrayInputStream(
+                  buffer, 0, size)));
+          index.add(idx);
+        }
+      } else {
+        if (entryCount != 0) {
+          throw new RuntimeException("Internal error");
+        }
+      }
+      this.comparator = comparator;
+    }
+
+    /**
+     * @param key
+     *          input key.
+     * @return the ID of the first block that contains key &gt;= input key. Or -1
+     *         if no such block exists.
+     */
+    public int lowerBound(RawComparable key) {
+      if (comparator == null) {
+        throw new RuntimeException("Cannot search in unsorted TFile");
+      }
+
+      if (firstKey == null) {
+        return -1; // not found
+      }
+
+      int ret = Utils.lowerBound(index, key, comparator);
+      if (ret == index.size()) {
+        return -1;
+      }
+      return ret;
+    }
+
+    public int upperBound(RawComparable key) {
+      if (comparator == null) {
+        throw new RuntimeException("Cannot search in unsorted TFile");
+      }
+
+      if (firstKey == null) {
+        return -1; // not found
+      }
+
+      int ret = Utils.upperBound(index, key, comparator);
+      if (ret == index.size()) {
+        return -1;
+      }
+      return ret;
+    }
+
+    /**
+     * For writing to file.
+     */
+    public TFileIndex(BytesComparator comparator) {
+      index = new ArrayList&lt;TFileIndexEntry&gt;();
+      this.comparator = comparator;
+    }
+
+    public RawComparable getFirstKey() {
+      return firstKey;
+    }
+
+    public void setFirstKey(byte[] key, int offset, int length) {
+      firstKey = new ByteArray(new byte[length]);
+      System.arraycopy(key, offset, firstKey.buffer(), 0, length);
+    }
+
+    public RawComparable getLastKey() {
+      if (index.size() == 0) {
+        return null;
+      }
+      return new ByteArray(index.get(index.size() - 1).buffer());
+    }
+
+    public void addEntry(TFileIndexEntry keyEntry) {
+      index.add(keyEntry);
+    }
+
+    public TFileIndexEntry getEntry(int bid) {
+      return index.get(bid);
+    }
+
+    public void write(DataOutput out) throws IOException {
+      if (firstKey == null) {
+        Utils.writeVInt(out, 0);
+        return;
+      }
+
+      DataOutputBuffer dob = new DataOutputBuffer();
+      Utils.writeVInt(dob, firstKey.size());
+      dob.write(firstKey.buffer());
+      Utils.writeVInt(out, dob.size());
+      out.write(dob.getData(), 0, dob.getLength());
+
+      for (TFileIndexEntry entry : index) {
+        dob.reset();
+        entry.write(dob);
+        Utils.writeVInt(out, dob.getLength());
+        out.write(dob.getData(), 0, dob.getLength());
+      }
+    }
+  }
+
+  /**
+   * TFile Data Index entry. We should try to make the memory footprint of each
+   * index entry as small as possible.
+   */
+  static final class TFileIndexEntry implements RawComparable {
+    final byte[] key;
+    // count of &lt;key, value&gt; entries in the block.
+    final long kvEntries;
+
+    public TFileIndexEntry(DataInput in) throws IOException {
+      int len = Utils.readVInt(in);
+      key = new byte[len];
+      in.readFully(key, 0, len);
+      kvEntries = Utils.readVLong(in);
+    }
+
+    // default entry, without any padding
+    public TFileIndexEntry(byte[] newkey, int offset, int len, long entries) {
+      key = new byte[len];
+      System.arraycopy(newkey, offset, key, 0, len);
+      this.kvEntries = entries;
+    }
+
+    @Override
+    public byte[] buffer() {
+      return key;
+    }
+
+    @Override
+    public int offset() {
+      return 0;
+    }
+
+    @Override
+    public int size() {
+      return key.length;
+    }
+
+    long entries() {
+      return kvEntries;
+    }
+
+    public void write(DataOutput out) throws IOException {
+      Utils.writeVInt(out, key.length);
+      out.write(key, 0, key.length);
+      Utils.writeVLong(out, kvEntries);
+    }
+  }
+
+  /**
+   * Dumping the TFile information.
+   * 
+   * @param args
+   *          A list of TFile paths.
+   */
+  public static void main(String[] args) {
+    System.out.printf("TFile Dumper (TFile %s, BCFile %s)\n", TFile.API_VERSION
+        .toString(), BCFile.API_VERSION.toString());
+    if (args.length == 0) {
+      System.out
+          .println("Usage: java ... org.apache.hadoop.io.file.tfile.TFile tfile-path [tfile-path ...]");
+      System.exit(0);
+    }
+    Configuration conf = new Configuration();
+
+    for (String file : args) {
+      System.out.println("===" + file + "===");
+      try {
+        TFileDumper.dumpInfo(file, System.out, conf);
+      } catch (IOException e) {
+        e.printStackTrace(System.err);
+      }
+    }
+  }
+}

Added: hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/TFileDumper.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/TFileDumper.java?rev=787913&amp;view=auto
==============================================================================
--- hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/TFileDumper.java (added)
+++ hadoop/common/trunk/src/java/org/apache/hadoop/io/file/tfile/TFileDumper.java Wed Jun 24 05:48:25 2009
@@ -0,0 +1,295 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership. The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.io.file.tfile;
+
+import java.io.IOException;
+import java.io.PrintStream;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.io.file.tfile.BCFile.BlockRegion;
+import org.apache.hadoop.io.file.tfile.BCFile.MetaIndexEntry;
+import org.apache.hadoop.io.file.tfile.TFile.TFileIndexEntry;
+import org.apache.hadoop.io.file.tfile.Utils.Version;
+
+/**
+ * Dumping the information of a TFile.
+ */
+class TFileDumper {
+  static final Log LOG = LogFactory.getLog(TFileDumper.class);
+
+  private TFileDumper() {
+    // namespace object not constructable.
+  }
+
+  private enum Align {
+    LEFT, CENTER, RIGHT, ZERO_PADDED;
+    static String format(String s, int width, Align align) {
+      if (s.length() &gt;= width) return s;
+      int room = width - s.length();
+      Align alignAdjusted = align;
+      if (room == 1) {
+        alignAdjusted = LEFT;
+      }
+      if (alignAdjusted == LEFT) {
+        return s + String.format("%" + room + "s", "");
+      }
+      if (alignAdjusted == RIGHT) {
+        return String.format("%" + room + "s", "") + s;
+      }
+      if (alignAdjusted == CENTER) {
+        int half = room / 2;
+        return String.format("%" + half + "s", "") + s
+            + String.format("%" + (room - half) + "s", "");
+      }
+      throw new IllegalArgumentException("Unsupported alignment");
+    }
+
+    static String format(long l, int width, Align align) {
+      if (align == ZERO_PADDED) {
+        return String.format("%0" + width + "d", l);
+      }
+      return format(Long.toString(l), width, align);
+    }
+
+    static int calculateWidth(String caption, long max) {
+      return Math.max(caption.length(), Long.toString(max).length());
+    }
+  }
+
+  /**
+   * Dump information about TFile.
+   * 
+   * @param file
+   *          Path string of the TFile
+   * @param out
+   *          PrintStream to output the information.
+   * @param conf
+   *          The configuration object.
+   * @throws IOException
+   */
+  static public void dumpInfo(String file, PrintStream out, Configuration conf)
+      throws IOException {
+    final int maxKeySampleLen = 16;
+    Path path = new Path(file);
+    FileSystem fs = path.getFileSystem(conf);
+    long length = fs.getFileStatus(path).getLen();
+    FSDataInputStream fsdis = fs.open(path);
+    TFile.Reader reader = new TFile.Reader(fsdis, length, conf);
+    try {
+      LinkedHashMap&lt;String, String&gt; properties =
+          new LinkedHashMap&lt;String, String&gt;();
+      int blockCnt = reader.readerBCF.getBlockCount();
+      int metaBlkCnt = reader.readerBCF.metaIndex.index.size();
+      properties.put("BCFile Version", reader.readerBCF.version.toString());
+      properties.put("TFile Version", reader.tfileMeta.version.toString());
+      properties.put("File Length", Long.toString(length));
+      properties.put("Data Compression", reader.readerBCF
+          .getDefaultCompressionName());
+      properties.put("Record Count", Long.toString(reader.getEntryCount()));
+      properties.put("Sorted", Boolean.toString(reader.isSorted()));
+      if (reader.isSorted()) {
+        properties.put("Comparator", reader.getComparatorName());
+      }
+      properties.put("Data Block Count", Integer.toString(blockCnt));
+      long dataSize = 0, dataSizeUncompressed = 0;
+      if (blockCnt &gt; 0) {
+        for (int i = 0; i &lt; blockCnt; ++i) {
+          BlockRegion region =
+              reader.readerBCF.dataIndex.getBlockRegionList().get(i);
+          dataSize += region.getCompressedSize();
+          dataSizeUncompressed += region.getRawSize();
+        }
+        properties.put("Data Block Bytes", Long.toString(dataSize));
+        if (reader.readerBCF.getDefaultCompressionName() != "none") {
+          properties.put("Data Block Uncompressed Bytes", Long
+              .toString(dataSizeUncompressed));
+          properties.put("Data Block Compression Ratio", String.format(
+              "1:%.1f", (double) dataSizeUncompressed / dataSize));
+        }
+      }
+
+      properties.put("Meta Block Count", Integer.toString(metaBlkCnt));
+      long metaSize = 0, metaSizeUncompressed = 0;
+      if (metaBlkCnt &gt; 0) {
+        Collection&lt;MetaIndexEntry&gt; metaBlks =
+            reader.readerBCF.metaIndex.index.values();
+        boolean calculateCompression = false;
+        for (Iterator&lt;MetaIndexEntry&gt; it = metaBlks.iterator(); it.hasNext();) {
+          MetaIndexEntry e = it.next();
+          metaSize += e.getRegion().getCompressedSize();
+          metaSizeUncompressed += e.getRegion().getRawSize();
+          if (e.getCompressionAlgorithm() != Compression.Algorithm.NONE) {
+            calculateCompression = true;
+          }
+        }
+        properties.put("Meta Block Bytes", Long.toString(metaSize));
+        if (calculateCompression) {
+          properties.put("Meta Block Uncompressed Bytes", Long
+              .toString(metaSizeUncompressed));
+          properties.put("Meta Block Compression Ratio", String.format(
+              "1:%.1f", (double) metaSizeUncompressed / metaSize));
+        }
+      }
+      properties.put("Meta-Data Size Ratio", String.format("1:%.1f",
+          (double) dataSize / metaSize));
+      long leftOverBytes = length - dataSize - metaSize;
+      long miscSize =
+          BCFile.Magic.size() * 2 + Long.SIZE / Byte.SIZE + Version.size();
+      long metaIndexSize = leftOverBytes - miscSize;
+      properties.put("Meta Block Index Bytes", Long.toString(metaIndexSize));
+      properties.put("Headers Etc Bytes", Long.toString(miscSize));
+      // Now output the properties table.
+      int maxKeyLength = 0;
+      Set&lt;Map.Entry&lt;String, String&gt;&gt; entrySet = properties.entrySet();
+      for (Iterator&lt;Map.Entry&lt;String, String&gt;&gt; it = entrySet.iterator(); it
+          .hasNext();) {
+        Map.Entry&lt;String, String&gt; e = it.next();
+        if (e.getKey().length() &gt; maxKeyLength) {
+          maxKeyLength = e.getKey().length();
+        }
+      }
+      for (Iterator&lt;Map.Entry&lt;String, String&gt;&gt; it = entrySet.iterator(); it
+          .hasNext();) {
+        Map.Entry&lt;String, String&gt; e = it.next();
+        out.printf("%s : %s\n", Align.format(e.getKey(), maxKeyLength,
+            Align.LEFT), e.getValue());
+      }
+      out.println();
+      reader.checkTFileDataIndex();
+      if (blockCnt &gt; 0) {
+        String blkID = "Data-Block";
+        int blkIDWidth = Align.calculateWidth(blkID, blockCnt);
+        int blkIDWidth2 = Align.calculateWidth("", blockCnt);
+        String offset = "Offset";
+        int offsetWidth = Align.calculateWidth(offset, length);
+        String blkLen = "Length";
+        int blkLenWidth =
+            Align.calculateWidth(blkLen, dataSize / blockCnt * 10);
+        String rawSize = "Raw-Size";
+        int rawSizeWidth =
+            Align.calculateWidth(rawSize, dataSizeUncompressed / blockCnt * 10);
+        String records = "Records";
+        int recordsWidth =
+            Align.calculateWidth(records, reader.getEntryCount() / blockCnt
+                * 10);
+        String endKey = "End-Key";
+        int endKeyWidth = Math.max(endKey.length(), maxKeySampleLen * 2 + 5);
+
+        out.printf("%s %s %s %s %s %s\n", Align.format(blkID, blkIDWidth,
+            Align.CENTER), Align.format(offset, offsetWidth, Align.CENTER),
+            Align.format(blkLen, blkLenWidth, Align.CENTER), Align.format(
+                rawSize, rawSizeWidth, Align.CENTER), Align.format(records,
+                recordsWidth, Align.CENTER), Align.format(endKey, endKeyWidth,
+                Align.LEFT));
+
+        for (int i = 0; i &lt; blockCnt; ++i) {
+          BlockRegion region =
+              reader.readerBCF.dataIndex.getBlockRegionList().get(i);
+          TFileIndexEntry indexEntry = reader.tfileIndex.getEntry(i);
+          out.printf("%s %s %s %s %s ", Align.format(Align.format(i,
+              blkIDWidth2, Align.ZERO_PADDED), blkIDWidth, Align.LEFT), Align
+              .format(region.getOffset(), offsetWidth, Align.LEFT), Align
+              .format(region.getCompressedSize(), blkLenWidth, Align.LEFT),
+              Align.format(region.getRawSize(), rawSizeWidth, Align.LEFT),
+              Align.format(indexEntry.kvEntries, recordsWidth, Align.LEFT));
+          byte[] key = indexEntry.key;
+          boolean asAscii = true;
+          int sampleLen = Math.min(maxKeySampleLen, key.length);
+          for (int j = 0; j &lt; sampleLen; ++j) {
+            byte b = key[j];
+            if ((b &lt; 32 &amp;&amp; b != 9) || (b == 127)) {
+              asAscii = false;
+            }
+          }
+          if (!asAscii) {
+            out.print("0X");
+            for (int j = 0; j &lt; sampleLen; ++j) {
+              byte b = key[i];
+              out.printf("%X", b);
+            }
+          } else {
+            out.print(new String(key, 0, sampleLen));
+          }
+          if (sampleLen &lt; key.length) {
+            out.print("...");
+          }
+          out.println();
+        }
+      }
+
+      out.println();
+      if (metaBlkCnt &gt; 0) {
+        String name = "Meta-Block";
+        int maxNameLen = 0;
+        Set&lt;Map.Entry&lt;String, MetaIndexEntry&gt;&gt; metaBlkEntrySet =
+            reader.readerBCF.metaIndex.index.entrySet();
+        for (Iterator&lt;Map.Entry&lt;String, MetaIndexEntry&gt;&gt; it =
+            metaBlkEntrySet.iterator(); it.hasNext();) {
+          Map.Entry&lt;String, MetaIndexEntry&gt; e = it.next();
+          if (e.getKey().length() &gt; maxNameLen) {
+            maxNameLen = e.getKey().length();
+          }
+        }
+        int nameWidth = Math.max(name.length(), maxNameLen);
+        String offset = "Offset";
+        int offsetWidth = Align.calculateWidth(offset, length);
+        String blkLen = "Length";
+        int blkLenWidth =
+            Align.calculateWidth(blkLen, metaSize / metaBlkCnt * 10);
+        String rawSize = "Raw-Size";
+        int rawSizeWidth =
+            Align.calculateWidth(rawSize, metaSizeUncompressed / metaBlkCnt
+                * 10);
+        String compression = "Compression";
+        int compressionWidth = compression.length();
+        out.printf("%s %s %s %s %s\n", Align.format(name, nameWidth,
+            Align.CENTER), Align.format(offset, offsetWidth, Align.CENTER),
+            Align.format(blkLen, blkLenWidth, Align.CENTER), Align.format(
+                rawSize, rawSizeWidth, Align.CENTER), Align.format(compression,
+                compressionWidth, Align.LEFT));
+
+        for (Iterator&lt;Map.Entry&lt;String, MetaIndexEntry&gt;&gt; it =
+            metaBlkEntrySet.iterator(); it.hasNext();) {
+          Map.Entry&lt;String, MetaIndexEntry&gt; e = it.next();
+          String blkName = e.getValue().getMetaName();
+          BlockRegion region = e.getValue().getRegion();
+          String blkCompression =
+              e.getValue().getCompressionAlgorithm().getName();
+          out.printf("%s %s %s %s %s\n", Align.format(blkName, nameWidth,
+              Align.LEFT), Align.format(region.getOffset(), offsetWidth,
+              Align.LEFT), Align.format(region.getCompressedSize(),
+              blkLenWidth, Align.LEFT), Align.format(region.getRawSize(),
+              rawSizeWidth, Align.LEFT), Align.format(blkCompression,
+              compressionWidth, Align.LEFT));
+        }
+      }
+    } finally {
+      IOUtils.cleanup(LOG, reader, fsdis);
+    }
+  }
+}




</pre>
</div>
</content>
</entry>
</feed>
