hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2172) PositionCache was removed from FSDataInputStream, causes extremely bad MapFile performance
Date Thu, 08 Nov 2007 21:33:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541142
] 

Doug Cutting commented on HADOOP-2172:
--------------------------------------

Raghu: I take it you're voting for caching in local FS?

Another place we could cache the position is in BufferedFSInputStream.  That's the place where
we depend on getPos() being fast.  When someone seeks, in order to check whether the seek
is within the buffer, we need to know where the buffer is in the file.  We currently call
getPos() on the underlying stream, which is slow on the local filesystem impl, since it makes
a system call.  So is this a better place to cache?  I started to code that, but it will take
more code, since BufferedFSInputStream doesn't already override all the position-changing
methods.  So I'm currently leaning towards pushing the cache down into the local fs impl.


> PositionCache was removed from FSDataInputStream, causes extremely bad MapFile performance
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2172
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2172
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.3, 0.15.0
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>            Priority: Blocker
>             Fix For: 0.15.1
>
>         Attachments: HADOOP-2172-2.patch, positioncache-v1.patch
>
>
> The PositionCache in FSDataInputStream seems to have been removed in HADOOP-1470. This
causes for example MapFile.get usage to be  extremely slow as the file position isn't cached
in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message