trafficserver-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Plevyak (Commented) (JIRA)" <>
Subject [jira] [Commented] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
Date Mon, 12 Dec 2011 19:01:31 GMT


John Plevyak commented on TS-949:

I agree that this code is too raw.  I wanted to get the bones of a solution out there, but
I am definitely not wedded to the implementation.

RE: when a new volume is added; one solution is to probe back into previous configurations
(rather than, say, just the second most likely location).  This is the approach that the clustering
code takes (see cluster/ configuration_add_machine, cluster_machine_depth_list).

I think that this code and that code should be merged.   The new hash table generator from
this code combined with the history mechanism from that code.
The alternative in both cases is to just return the first N most likely locations.  This is
probably OK for the cache because it would be a local in-memory probe 99.9% of the time, but
would more expensive for clustering as it would require going off-node 100% of the time.
> key->volume hash table is not consistent when a disk is marked as bad or removed due
to failure
> -----------------------------------------------------------------------------------------------
>                 Key: TS-949
>                 URL:
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch, explicit-pair.patch
> The method for resolving collisions when distributing hash-table space to volumes for
the object_key->volume hash table creates inconsistency when a disk is determined to be
bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random
index in the hash table until the hash space is exhausted.  The random order in which a given
volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when
a volume attempts to draft a slot which has already been occupied, it skips to its next random
pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash
is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that
the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives,
a volume may secure an index that was previously occupied by the dead-disk.  In the old hash,
the surviving volume would have selected another random index due to contention.  If this
index is taken, by the next draft round it will represent an inconsistent key->volume result.
 The effects of one inconsistency will then cascade as whichever volume occupies that index
after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies
become objects stored in a volume but lost to the top level cache for open/lookup.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message