hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiang Li (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-18555) Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
Date Thu, 10 Aug 2017 16:42:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121889#comment-16121889
] 

Xiang Li edited comment on HBASE-18555 at 8/10/17 4:41 PM:
-----------------------------------------------------------

Hi Mike Drob, 
bq. This seems like the perfect use case for putIfAbsent
Yes, it is, but could be better. Map#putIfAbsent() is like
{code}
 V v = map.get(key);
 if (v == null)
     v = map.put(key, value);

 return v;
{code}
When the key is associated with a value, it performs get(). When the key is not associated
with a value, it performs get() and then put(). familyMap is TreeMap, which is Red-Black tree.
The time complexity of get() and put() is quite low, but it is still O(logn).
So there could be a better way than putIfAbsent(), like the patch does: only put (key=family,
value=list) right after the list is allocated.


was (Author: water):
Hi Mike Drob, 
bq. This seems like the perfect use case for putIfAbsent
Yes, it is, but could be better. Map#putIfAbsent() is like
{code}
 V v = map.get(key);
 if (v == null)
     v = map.put(key, value);

 return v;
{code}
When the key is associated with a value, it performs get(). When the key is not associated
with a value, it performs get() and then put(). familyMap is TreeMap, which is Red-Black tree.
The time complexity of get() and put() is quite low, but it is still O(log(n)).
So there could be a better way than putIfAbsent(), like the patch does: only put (key=family,
value=list) right after the list is allocated.

> Remove redundant familyMap.put() from Put#addColumn() and addImmutable()
> ------------------------------------------------------------------------
>
>                 Key: HBASE-18555
>                 URL: https://issues.apache.org/jira/browse/HBASE-18555
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client
>            Reporter: Xiang Li
>            Assignee: Xiang Li
>            Priority: Minor
>         Attachments: HBASE-18555.master.000.patch
>
>
> In Put#addColumn() and addImmutable(), after getting the cell list of the given family
and add the cell into the list, the code puts (key=family, value=list) into familyMap.
> In addColumn(), it is like
> {code}
>     List<Cell> list = getCellList(family);
>     KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
>     list.add(kv);
>     familyMap.put(CellUtil.cloneFamily(kv), list); // <-- here
>     return this;
> {code}
> In addImmutable(), it is like
> {code}
>     List<Cell> list = getCellList(family);
>     KeyValue kv = createPutKeyValue(family, qualifier, ts, value, tag);
>     list.add(kv);
>     familyMap.put(family, list); // <-- here
>     return this;
> {code}
> I think those put() for Map only take effect when getCellList(family) returns a new allocated
ArrayList. When the list for a family already exist, put() for Map will update the value to
the reference of the list, but actually, the reference of the list is not changed.
> Those put() do not do any harm in terms of the correctness when they are here. But it
could be removed to improve the performance. familyMap searches for key and set the new value
and return the old value. Those operation take some time but actually are not needed. 
> The put() could be moved into Mutation#getCellList(family)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message