asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann (Code Review)" <do-not-re...@asterixdb.incubator.apache.org>
Subject Change in hyracks[master]: ASTERIXDB-1102: VarSize Encoding to store length of String a...
Date Thu, 15 Oct 2015 23:37:39 GMT
Till Westmann has posted comments on this change.

Change subject: ASTERIXDB-1102: VarSize Encoding to store length of String and ByteArray
......................................................................


Patch Set 8:

(8 comments)

https://asterix-gerrit.ics.uci.edu/#/c/449/8/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/ByteArrayPointable.java
File hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/ByteArrayPointable.java:

Line 106:     private int hash = 0;
Could we move those to the beginning of the class (i.e. before the first methods/constructor)
to make the statefulness for the object more visible?


Line 169:     public static int normalize(byte[] bytesPtr, int start) {
Could you add a comment what "normalization" is in this case?


https://asterix-gerrit.ics.uci.edu/#/c/449/8/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java
File hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java:

Line 76:     private int stringLength;
I'm not sure that I'm happy that these pointables are stateful now, but if we remain careful
to avoid concurrent use I think that it should be fine. 
However, I think that it would be good to put the data members to the beginning of the class
to make it more visible that these objects are stateful.
So I'd put them before the first method.


Line 92:      * Since the {@code utf8Length} and the {@code metaLength} are often used, we
compute those two value in advance.
s/value/values/


https://asterix-gerrit.ics.uci.edu/#/c/449/8/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/AbstractVarLenObjectBuilder.java
File hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/AbstractVarLenObjectBuilder.java:

Line 49:             int diff = estimateMetaLen - actualMetaLen;
It would be nice if this case was covered in the UTF8StringPointableTest.


https://asterix-gerrit.ics.uci.edu/#/c/449/8/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8CharSequence.java
File hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8CharSequence.java:

Line 55:             buf = new char[utfLen];
It seems relatively expensive to always allocate *and* copy the whole string  - especially
as strings could be really big after this change. 
Thoughts?


https://asterix-gerrit.ics.uci.edu/#/c/449/8/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/AbstractUTF8StringBinaryTokenizer.java
File hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/AbstractUTF8StringBinaryTokenizer.java:

Line 59:     //TODO: This UTF8Tokenizer strongly rely on the Asterix data format,
s/rely/relies/


https://asterix-gerrit.ics.uci.edu/#/c/449/8/hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/string/UTF8StringUtil.java
File hyracks/hyracks-util/src/main/java/org/apache/hyracks/util/string/UTF8StringUtil.java:

Line 214:     public static int normalize(byte[] bytes, int start) {
Could we document what "normalize" means in this case?


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/449
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7e95df0f06984b784ebac2c84b97e56a50207d27
Gerrit-PatchSet: 8
Gerrit-Project: hyracks
Gerrit-Branch: master
Gerrit-Owner: Jianfeng Jia <jianfeng.jia@gmail.com>
Gerrit-Reviewer: Ian Maxon <imaxon@apache.org>
Gerrit-Reviewer: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Jianfeng Jia <jianfeng.jia@gmail.com>
Gerrit-Reviewer: Preston Carman <prestonc@apache.org>
Gerrit-Reviewer: Taewoo Kim <wangsaeu@gmail.com>
Gerrit-Reviewer: Till Westmann <tillw@apache.org>
Gerrit-HasComments: Yes

Mime
View raw message