hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Schubert Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1818) HFile code review and refinement
Date Mon, 07 Sep 2009 15:22:57 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752169#action_12752169
] 

Schubert Zhang commented on HBASE-1818:
---------------------------------------

Code review comments:

1. HFile.Writer

(1) private int totalBytes = 0;  ==> private long totalBytes = 0;
      int is too small (2GB) to limit the size of uncompressed data.
 
(2) private void finishBlock()
      use int for variable "size".

(3) private void checkKey(final byte [] key, final int offset, final int length)
      to reject to append duplicated key.

    if (this.comparator.compare(this.lastKeyBuffer, this.lastKeyOffset,
            this.lastKeyLength, key, offset, length) > 0)

   should be:

   if (this.comparator.compare(this.lastKeyBuffer, this.lastKeyOffset,
            this.lastKeyLength, key, offset, length) >= 0)

(4) private long writeFileInfo(FSDataOutputStream o)
      
      int avgValueLen = this.entryCount == 0? 0:
        (int)(this.keylength/this.entryCount);

     should be:

           int avgValueLen = this.entryCount == 0? 0:
        (int)(this.valuelength/this.entryCount);

2. HFile.Reader

(1) public ByteBuffer getMetaBlock(String metaBlockName)
     it's better to use:
      if (metaIndex.count <= 0) {
          return null; // there are no meta blocks
      }

     then:

      if (trailer.metaIndexCount == 0) {
        return null; // there are no meta blocks
      }

      and add:
      if (buf == null)
    	return null;
      since decompress() may return null.
      
      remove following to avoud block copy, and improve performance. But to to pay attention,
the current position of the buf is at METABLOCKMAGIC.length (not 0):

      // Toss the header. May have to remove later due to performance.
      buf.compact();
      buf.limit(buf.limit() - METABLOCKMAGIC.length);
      buf.rewind();


(2) ByteBuffer readBlock(int block, boolean cacheBlock)
       boundary bug.
       if (block < 0 || block > blockIndex.count) {
        throw new IOException("Requested block is out of range: " + block +
          ", max: " + blockIndex.count);
      }

     should be:

      if (block < 0 || block >= blockIndex.count) {
        throw new IOException("Requested block is out of range: " + block +
          ", max: " + blockIndex.count);
      }

      add:
       if (buf == null) {
          throw new IOException("Decompress block failure " + block);
        }
       since decompress() may return null.
   
      remove following to avoud block copy, and improve performance. But to to pay attention,
the current position of the buf is at METABLOCKMAGIC.length (not 0), and the cached block
also have this position.

      // Toss the header. May have to remove later due to performance.
      buf.compact();
      buf.limit(buf.limit() - METABLOCKMAGIC.length);
      buf.rewind();
       
(3)   public boolean seekTo()
    
        if (block != null && currBlock == 0) {
          block.rewind();
          block.position(DATABLOCKMAGIC.length);
          currKeyLen = block.getInt();
          currValueLen = block.getInt();
          return true;
        }
        above code add block.position(DATABLOCKMAGIC.length); since we remove buf.compact
previously.
        add return ture, I guess the code miss it.

(4)          private void loadBlock(int bloc) throws IOException {
        if (block == null) {
          block = reader.readBlock(bloc, cacheBlocks);
          currBlock = bloc;
          blockFetches++;
        } else {
          if (bloc != currBlock) {
            block = reader.readBlock(bloc, cacheBlocks);
            currBlock = bloc;
            blockFetches++;
          } else {
            // we are already in the same block, just rewind to seek again.
            block.rewind();
            block.position(DATABLOCKMAGIC.length);  // add this code since we remove buf.compact
previously.
          }
        }
      }
    }


     


> HFile code review and refinement
> --------------------------------
>
>                 Key: HBASE-1818
>                 URL: https://issues.apache.org/jira/browse/HBASE-1818
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.20.0
>            Reporter: Schubert Zhang
>            Assignee: Schubert Zhang
>            Priority: Minor
>             Fix For: 0.20.1
>
>
> HFile is a good mimic of Google's SSTable file format. And we want HFile to become a
common file format of hadoop in the near future.
> We will review the code of HFile and record the comments here, and then provide fixed
patch after the review.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message