accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-578) consider using hdfs for the walog
Date Mon, 21 May 2012 19:05:41 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280369#comment-13280369
] 

Keith Turner commented on ACCUMULO-578:
---------------------------------------

Eric,

I was looking at the proposed GC algorithm and trying to think of situations where the following
might occur.

 # In use walog is deleted
 # Unused walog is never deleted

It looks pretty solid.  The following interleaving of events could be problematic.  This is
possible because there is time between when a lock is deleted and when a tablet server kills
itself.

 # TserverA creates Walog1
 # User deletes lock for TserverA
 # GC does not see TServerA in zookeeper
 # GC does not see any references to Walog1 in !METADATA
 # TserverA writes that TabletX is using Walog1
 # TserverA notices its lock went away and kills itself
 # GC deletes Walog1
 # TabletX fails to load because Walog1 does not exists

                
> consider using hdfs for the walog
> ---------------------------------
>
>                 Key: ACCUMULO-578
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-578
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: logger, tserver
>    Affects Versions: 1.5.0-SNAPSHOT
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>         Attachments: HDFS_WAL_states.pdf, comparison.png
>
>
> Using HDFS for walogs would fix:
>  * ACCUMULO-84: any node can read the replicated files
>  * ACCUMULO-558: wouldn't need to monitor loggers
>  * ACCUMULO-544: log references wouldn't include hostnames
>  * ACCUMULO-423: wouldn't need to monitor loggers
>  * ACCUMULO-258: hdfs has load balancing already
> To implement it, we would need the ability to distribute log sorts.
> Continuing to use loggers helps us avoid:
>  * hdfs pipeline strategy
>  * we don't have fine-grained insight when a single node makes dfs slow
>  * additional namenode pressure
>  * flexibility: for example, we can add fadvise() calls to the logger before HDFS supports
it

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message