hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiang Li (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-14882) Provide a Put API that adds the provided family, qualifier, value without copying
Date Fri, 09 Sep 2016 15:54:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15476403#comment-15476403
] 

Xiang Li edited comment on HBASE-14882 at 9/9/16 3:54 PM:
----------------------------------------------------------

Hi [~anoop.hbase], I have some questions, could you please take a look at them when you have
time?

1. KeyValue implements 5 more interfaces beside Cell, they are HeapSize, Cloneable, SettableSequenceId,
SettableTimestamp, Streamable(Streamable is only in master branch for 2.0.0).
Do I need to implement them for IndividualBytesFieldCell?

I think SettableSequenceId is a must, because I found an error when testing my patch without
SettableSequenceId implemented.
When HBase is re-started, the following exception can be found in RS log when master asks
the RS to open a region
{code}
java.io.IOException: java.lang.UnsupportedOperationException: Cell is not of type org.apache.hadoop.hbase.SettableSequenceId
        at org.apache.hadoop.hbase.CellUtil.setSequenceId(CellUtil.java:676)
        at org.apache.hadoop.hbase.regionserver.wal.FSWALEntry.stampRegionSequenceId(FSWALEntry.java:124)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1853)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1750)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1672)
        at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.UnsupportedOperationException: Cell is not of type org.apache.hadoop.hbase.SettableSequenceId
        ... 9 more
{code}

SettableSequenceId and SettableTimestamp are server-side code to me, but Put is client-side
code. I am just wondering why the Cell implementation used in Put(in client side) are expected
to implement some server-side code.

2. About Streamable interface which is added by HBASE-13754.
(1) Do I need to implement Streamable as well? 
Regarding the description of HBASE-13754, you mentioned:
bq. For other cell implementation, we will call getXXXLength() and getXXXArray() and write
each component one after the other
It can improve the efficiency if I have a single backing array as a whole and implement Streamable,
but it is still ok if it is not implemented. I will just call getXXXArray(), getXXXLength
and getXXXOffset() and write each field one by one if I override Streamable#write(). So it
is not a must, do I get it right?

(2) I read your discussion with Stack but still has a question for not having a method to
do de-serialization (that is, read()) to pair with write(). 
Regarding your comment in HBASE-13754 when explaining your idea why there is only write()
but no read()
{quote}
I think that is the correct path especially if we add the method to Cell interface itself.
All the Cell impls to have default constructor. The Codec will create a Cell object of its
choice and ask the object to make its data from the stream. But with the new interface type
I thought to just leave it as we are not sure whether all Cells will impl the new interface
{quote}
Do you mean that if one needs to do the de-serialization against my new Cell implementation
- IndividualBytesFieldCell, he/she needs to cut each field from stream and then call the constructor
to build a new Cell or a new IndividualBytesFieldCell? But in this way, he/she needs to read
and fully understand what I code in IndividualBytesFieldCell#write(). Maybe it does not quite
follow the principle of encapsulation and not exposing the internal implementation? I mean,
he/she must know clearly about the internal fields of IndividualBytesFieldCell and how I code
write(), in order to implement read() correctly.



was (Author: water):
Hi [~anoop.hbase], I have some questions, could you please take a look at them when you have
time?

1. KeyValue implements 5 more interfaces beside Cell, they are HeapSize, Cloneable, SettableSequenceId,
SettableTimestamp, Streamable(Streamable is only in master branch for 2.0.0).
Do I need to implement them for IndividualBytesFieldCell?

I think SettableSequenceId is a must, because I found an error when testing my patch without
SettableSequenceId implemented.
When HBase is re-started, the following exception can be found in RS log when master asks
the RS to open a region
{code}
java.io.IOException: java.lang.UnsupportedOperationException: Cell is not of type org.apache.hadoop.hbase.SettableSequenceId
        at org.apache.hadoop.hbase.CellUtil.setSequenceId(CellUtil.java:676)
        at org.apache.hadoop.hbase.regionserver.wal.FSWALEntry.stampRegionSequenceId(FSWALEntry.java:124)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1853)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1750)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1672)
        at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.UnsupportedOperationException: Cell is not of type org.apache.hadoop.hbase.SettableSequenceId
        ... 9 more
{code}

SettableSequenceId and SettableTimestamp are server-side code to me, but Put is client-side
code. I am just wondering why the Cell implementation used in Put(in client side) are expected
to implement some server-side code.

2. About Streamable interface which is added by HBASE-13754.
(1) Do I need to implement Streamable as well? 
Regarding the description of HBASE-13754, you mentioned:
bq. For other cell implementation, we will call getXXXLength() and getXXXArray() and write
each component one after the other
It can improve the efficiency if I have a single backing array as a whole and implement Streamable,
but it is still ok if it is not implemented. I will just call getXXXArray(), getXXXLength
and getXXXOffset() and write each field one by one if I override Streamable#write(). So it
is not a must, do I get it right?

(2) I read your discussion with Stack but still has a question for not having a method to
do de-serialization (that is, read()) to pair with write(). 
Regarding your comment in HBASE-13754 when explaining your idea why there is only write()
but no read()
{quote}
I think that is the correct path especially if we add the method to Cell interface itself.
All the Cell impls to have default constructor. The Codec will create a Cell object of its
choice and ask the object to make its data from the stream. But with the new interface type
I thought to just leave it as we are not sure whether all Cells will impl the new interface
{quote}
Do you mean that if one needs to do the de-serialization against my new Cell implementation
- IndividualBytesFieldCell, he/she needs to cut each field from stream and then call the constructor
to build a new Cell or a new IndividualBytesFieldCell? But in this way, he/she needs to read
and fully understand what I code in IndividualBytesFieldCell#write(). Does it not quite follow
the rule of encapsulation and not exposing the internal implementation? I mean, he/she must
know clearly about the internal fields of IndividualBytesFieldCell and how I code write(),
in order to implement read() correctly.


> Provide a Put API that adds the provided family, qualifier, value without copying
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-14882
>                 URL: https://issues.apache.org/jira/browse/HBASE-14882
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Jerry He
>            Assignee: Xiang Li
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14882.master.000.patch, HBASE-14882.master.001.patch
>
>
> In the Put API, we have addImmutable()
> {code}
>  /**
>    * See {@link #addColumn(byte[], byte[], byte[])}. This version expects
>    * that the underlying arrays won't change. It's intended
>    * for usage internal HBase to and for advanced client applications.
>    */
>   public Put addImmutable(byte [] family, byte [] qualifier, byte [] value)
> {code}
> But in the implementation, the family, qualifier and value are still being copied locally
to create kv.
> Hopefully we should provide an API that truly uses immutable family, qualifier and value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message