hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rushabh S Shah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11634) Optimize BlockIterator when interating starts in the middle.
Date Thu, 13 Apr 2017 22:49:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968350#comment-15968350

Rushabh S Shah commented on HDFS-11634:

bq. Agreed, since we never iterate backwards, we don't need iterators from skipped storges.
We do iterate backwards when the requested data size is more than the fetched size (from the
offset we chose randomly).
It creates a brand new iterator there and it would be nice if we can use the same iterator
which we created above by resetting some index.
  if(totalSize<size) {
      iter = node.getBlockIterator(); // start from the beginning
      for(int i=0; i<startBlock&&totalSize<size; i++) {
        curBlock = iter.next();
        if(!curBlock.isComplete())  continue;
        if (curBlock.getNumBytes() < getBlocksMinBlockSize) {
        totalSize += addBlock(curBlock, results);

> Optimize BlockIterator when interating starts in the middle.
> ------------------------------------------------------------
>                 Key: HDFS-11634
>                 URL: https://issues.apache.org/jira/browse/HDFS-11634
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.6.5
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: HDFS-11634.001.patch, HDFS-11634.002.patch, HDFS-11634.003.patch,
HDFS-11634.004.patch, HDFS-11643.005.patch
> {{BlockManager.getBlocksWithLocations()}} needs to iterate blocks from a randomly selected
{{startBlock}} index. It creates an iterator which points to the first block and then skips
all blocks until {{startBlock}}. It is inefficient when DN has multiple storages. Instead
of skipping blocks one by one we can skip entire storages. Should be more efficient on average.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message