hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10531) Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo
Date Thu, 13 Mar 2014 00:35:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932698#comment-13932698
] 

stack commented on HBASE-10531:
-------------------------------

So not copying is faster using YCSB?

You need to add good tests for the additions to CellComparator.

bq. +    public int compareFlatKey(Cell left, Cell right) {

What is a 'flat' key?   Why do we have a Cell compare in KV?  Should it be in CellUtil or
CellComparator?  Or this is how KV is now?  It takes Cells?  It seems so.

Here too.  Why in KV are we talking Cells?

-    public int compareTimestamps(final KeyValue left, final KeyValue right) {
+    public int compareTimestamps(final Cell left, final Cell right) {

... and here:

-    public int compareRows(final KeyValue left, final KeyValue right) {
+    public int compareRows(final Cell left, final Cell right) {

That said, nice to see our use of Cell in KV compares.

But I'd say KV methods should take KVs, not Cells?  If it takes Cells, could be comparing
a non-KV and it could actually work?  Is this what we want?  Maybe this is just uglyness left
over from how KV has been used/abused up to this?  But I'm thinking these compares would be
Cell compares out in a CellUtil or CellCompare class? 

You have long lines in here Mr. [~ram_krish]

You should mark these with @VisibleForTesting

+      // Only for test case purpose

It is good that the byte compares added are not public.

Shouldn't this be unsupportedoperationexception in your new key only class?

+    public byte[] getValueArray() {
+      return HConstants.EMPTY_BYTE_ARRAY;
+    }

Same for value offset and length.

Tags too?

What is difference between KeyAloneKeyValue and a copy of the original Key but with no value?
 I am wary introducing new type?  Or, a new type that just threw UnsupportedOperationException
when you tried to get a value...

Why we have to create new key when we pass to a comparator?  Is it because we need to parse
the bytes so can pass in an Object?  That makes sense. I see now why it is needed.  Suggest
rename it as KeyOnlyKeyValue rather than KeyAlongKV.    Also, the inner class needs a bit
of a class comment on why it is needed: i.e. you have a byte array but comparator wants a
Cell of some type.... rather than parse whole KV byte buffer....  This is used in the case
where we have key only bytes; i.e. the block index?

Is passing 0 correct in the below?

+      return comparator.compareFlatKey(key,
+          new KeyValue.KeyAloneKeyValue(current.keyBuffer, 0, current.keyLength));

Should it be an offset?  We do this '0' in a few places.

Yeah, what is a 'flat' key?  Is it key only?

So, this is the replacement: seekToKeyInBlock ?  Purge the old stuff!!!!

We have tests for the above?

Should this be a CellComparator rather than a KVComparator:

+
+    public int compareKey(KVComparator comparator, Cell key);

(Sorry if you answered this already).

Needs doc:  +  public static int binarySearch(byte[][] arr, Cell leftCell, RawComparator<Cell>
comparator) {

The array of byte arrays has Cells in it or it seems KVs?    Will it always be arrays of byte
arrays?  Or will the binary search be in a block?  Or is the 'arr' here a block?

Formatting:

-        forceBeforeOnExactMatch);
-    }else{
+          forceBeforeOnExactMatch);
+    } else {
       return seekToOrBeforeUsingPositionAtOrAfter(keyOnlyBytes, offset, length,
-        forceBeforeOnExactMatch);
+          forceBeforeOnExactMatch);

We creating a KV where we did not before here?

-        return this.delegate.seekBefore(key, offset, length);
+        return seekBefore(new KeyValue.KeyAloneKeyValue(key, offset, length));

Or is it that I am just misreading? (It is being created elsewhere anyways)

Why we add these byte-based methods?

+      public int reseekTo(byte[] key) throws IOException {

+      public int reseekTo(byte[] key, int offset, int length)

We should let this out?

+      public org.apache.hadoop.hbase.io.hfile.HFile.Reader getReader() {

Any chance of micro benchmarks on difference between this patch and what was there before?

Why we adding a method that passes key as bytes?

+    public BlockWithScanInfo loadDataBlockWithScanInfo(final byte[] key, int keyOffset,
+        int keyLength, HFileBlock currentBlock, boolean cacheBlocks,
+        boolean pread, boolean isCompaction)
+        throws IOException {

The ByteBuffers here come from where?

+    static int locateNonRootIndexEntry(ByteBuffer nonRootBlock, Cell key, KVComparator comparator)
{
+      int entryIndex = binarySearchNonRootIndex(key, nonRootBlock, comparator);


Yeah, why you add these byte array versions?

+    public boolean seekBefore(byte[] key, int offset, int length) throws IOException {
+      HFileBlock seekToBlock = reader.getDataBlockIndexReader().seekToDataBlock(key, offset,
+          length, block, cacheBlocks, pread, isCompaction);


Is there duplicated code between HFile2 and HFile3?

Just remove?

+  @Deprecated
   int seekTo(byte[] key) throws IOException;
+  @Deprecated
   int seekTo(byte[] key, int offset, int length) throws IOException;

This looks good:

-    int result = s.reseekTo(k.getBuffer(), k.getKeyOffset(), k.getKeyLength());
+    int result = s.reseekTo(k);

... and this:

-    int ret = scanner.reseekTo(getLastOnCol(curr).getKey());
+    int ret = scanner.reseekTo(getLastOnCol(curr));


We need this: TestNewReseekTo  ? Why not just change the current test to use Cells instead?

Is TestNewSeekTo a copy of the old code?  If so, just replace the byte based with the new
Cell based?

Patch looks great otherwise.  Good on you [~ram_krish]








> Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo
> ------------------------------------------------------------------------
>
>                 Key: HBASE-10531
>                 URL: https://issues.apache.org/jira/browse/HBASE-10531
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.99.0
>
>         Attachments: HBASE-10531.patch, HBASE-10531_1.patch, HBASE-10531_2.patch
>
>
> Currently the byte[] key passed to HFileScanner.seekTo and HFileScanner.reseekTo, is
a combination of row, cf, qual, type and ts.  And the caller forms this by using kv.getBuffer,
which is actually deprecated.  So see how this can be achieved considering kv.getBuffer is
removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message