hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gil Vernik (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12079) Make 'X-Newest' header a configurable
Date Wed, 10 Jun 2015 20:55:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581058#comment-14581058

Gil Vernik commented on HADOOP-12079:

Thanks for reviewing this. 
Right, this is the case where we should have X-Newest. As you said, It actually needed when
we write the same file name over an existing file and the new file is so large that not all
replicas are updated on the same time.Then if we perform immediate GET -  Swift may land on
replica that still not updated.  Another use for X-Newest, if we write the same file name
again and one of the replicas failed to update ( for some reason ). But it's not majority
of use cases, at least for me, so in my cases new files that overwrite existing ones are basically
the same size and thus all replicas are updated at about the same time. I also don't have
scripts that writes and overwrites files... Just normal user operations, where they write
a file and access it later. 
The unittest you mention is exactly the one that need X-Newest, since it send one request
after another without any delay.

Another case:  is when data already exists in Swift. We than use Hadoop to access and analyze
it. In this case X-Newest will just force Swift to access all replicas, so one GET from Hadoop
will force Swift to perform too many internal requests ( as number of replicas ) and it greatly
affects performance.

I will implement your suggestions and will submit another patch.
Can you please point me where is the documentation that i should update?

> Make 'X-Newest' header a configurable
> -------------------------------------
>                 Key: HADOOP-12079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/swift
>    Affects Versions: 3.0.0, 2.6.0
>            Reporter: Gil Vernik
>            Assignee: Gil Vernik
>             Fix For: 3.0.0, 2.6.1
>         Attachments: x-newest-optional0001.patch, x-newest-optional0002.patch, x-newest-optional0003.patch
> Current code always sends X-Newest header to Swift. While it's true that Swift is eventual
consistent and X-Newest will always get the newest version from Swift, in practice this header
will make Swift response very slow. 
> This header should be configured as an optional, so that it will be possible to access
Swift without this header and get much better performance. 
> This patch doesn't modify current behavior. All is working as is, but there is an option
to provide fs.swift.service.useXNewest = false. 
> Some background on Swift and X-Newest: 
> When a GET or HEAD request is made to an object, the default behavior is to get the data
from one of the replicas (could be any of them). The downside to this is that if there are
older versions of the object (due to eventual consistency) it is possible to get an older
version of the object. The upside is that the for the majority of use cases, this isn't an
issue. For the small subset of use cases that need to make sure that they get the latest version
of the object, they can set the "X-Newest" header to "True". If this is set, the proxy server
will check all replicas of the object and only return the newest object. The downside to this
is that the request can take longer, since it has to contact all the replicas. It is also
more expensive for the backend, so only recommended when it is absolutely needed.

This message was sent by Atlassian JIRA

View raw message