accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4062) Change MutationSet.mutations to use HashSet
Date Thu, 19 Nov 2015 20:12:11 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014268#comment-15014268
] 

Josh Elser commented on ACCUMULO-4062:
--------------------------------------

Oh, that must be new. The above I copied is in 1.7.

That approach looks like there is some area for missing equivalent mutations (e.g. 2 mutations
with column updates [a, b] and [b, a] would likely have different hashCodes despite being
equivalent in a read). If that's the case, I guess the question is how does the constant insert
time of a list (append plus cost of growing the list) compare to the average constant time
insert of the Java's HashMap (potentially being skewed with load or bad hashing). Would be
an interesting experiment.

> Change MutationSet.mutations to use HashSet
> -------------------------------------------
>
>                 Key: ACCUMULO-4062
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4062
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>            Reporter: Dave Marion
>
> Change TabletServerBatchWriter.MutationSet.mutations from a
> {code}
>   HashMap<String,List<Mutation>>
> {code}
> to
> {code}
>   HashMap<String,HashSet<Mutation>>
> {code}
> so that duplicate mutations added by a client are not sent to the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message