hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jakob Homan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization
Date Fri, 03 Sep 2010 20:34:35 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jakob Homan updated HDFS-1353:
------------------------------

        Summary: Remove most of getBlockLocation optimization  (was: Optimize number of block
access tokens returned by getBlockLocations)
    Description: 
<This description is not valid. See comment.>
HDFS-1081 optimized the number of block access tokens (BATs) created in a single call to getBlockLocations,
as this is an expensive operation.  However, that JIRA put off another optimization which
was then made possible, which is to just send a single block access token across the wire
(and maintain a single BAT on the client side).  This JIRA is for implementing that optimization.
 Since a single BAT is generated for all the blocks, we just write that single BAT to the
wire, rather than writing n BATs for n blocks, as is currently done.  This turns out to be
a useful optimization for files with very large numbers of blocks, as the new lone BAT is
much larger than was a BAT previously.

  was:HDFS-1081 optimized the number of block access tokens (BATs) created in a single call
to getBlockLocations, as this is an expensive operation.  However, that JIRA put off another
optimization which was then made possible, which is to just send a single block access token
across the wire (and maintain a single BAT on the client side).  This JIRA is for implementing
that optimization.  Since a single BAT is generated for all the blocks, we just write that
single BAT to the wire, rather than writing n BATs for n blocks, as is currently done.  This
turns out to be a useful optimization for files with very large numbers of blocks, as the
new lone BAT is much larger than was a BAT previously.


While benchmarking this new patch, originally an addendum to HDFS-1081, we determined that
1081's original benchmarks were in error.  getBlockLocations was not the culprit in the performance
degradation.  1081 didn't do any damage to speed, and with this addendum, actually does give
some benefit for files with moderate numbers of blocks (see to-be-attached benchmarks).  However,
since getBL isn't really a slow method, these gains aren't worth the extra complexity they
introduce.  I'll upload the on-the-wire optimization patch, in case it becomes useful at some
point, but I'm going to use this JIRA to roll back most of 1081, excluding some byte-array
allocating that we can easily cache.  ...sigh.

> Remove most of getBlockLocation optimization
> --------------------------------------------
>
>                 Key: HDFS-1353
>                 URL: https://issues.apache.org/jira/browse/HDFS-1353
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.21.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>             Fix For: 0.21.1
>
>         Attachments: Benchmarking results.xlsx, HDFS-1353-y20.patch
>
>
> <This description is not valid. See comment.>
> HDFS-1081 optimized the number of block access tokens (BATs) created in a single call
to getBlockLocations, as this is an expensive operation.  However, that JIRA put off another
optimization which was then made possible, which is to just send a single block access token
across the wire (and maintain a single BAT on the client side).  This JIRA is for implementing
that optimization.  Since a single BAT is generated for all the blocks, we just write that
single BAT to the wire, rather than writing n BATs for n blocks, as is currently done.  This
turns out to be a useful optimization for files with very large numbers of blocks, as the
new lone BAT is much larger than was a BAT previously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message