hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joep Rottinghuis (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-3706) Generalize native HBase writer for additional tables
Date Sat, 06 Jun 2015 04:34:01 GMT

     [ https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Joep Rottinghuis updated YARN-3706:
    Attachment: YARN-3726-YARN-2928.006.patch

Added YARN-3726-YARN-2928.006.patch
Thanks for comments [~sjlee0]

Reason I had the relatively simple BaseTable.setTableName(...) is that it allows me to not
have to "leak" the name of the configuration value for the table to be a public attribute.
Do you think it is better to just have a public member on EntityTable and set the value directly,
or to keep that private?

Wrt. EntityTable.java
I92 should be static. Not sure what you mean by this.

Reason I chose this to be static is to be able to have default behavior in the base that that
is extends. I'd love for Java to be able to let me define default behavior for static methods
and/or define method signatures (like in an interface) for static methods, but I can't seem
to do that.
What I'm really after is a getInstance(). Would you prefer if getInstance simple news up a
table instance each time, or are you generally against the pattern of being able to call getInstance()?

ColumnImpl.java does not implement the Column<T> interface because it a) doesn't implement
store(byte[], TypedBufferedMutator<T>, Long, Object).
Also it isn't mean to be instantiated directly as a Column, it just is intended as the backing
functionality for actual Column implementations. ColumnImpl is probably a poor table name
choice. BaseColumn is also not the right name, because Columns wouldn't extent BaseColumn.
Should I rename this to BackingColumn?

I agree with the USERNAME_SPLITS being public. I've left a comment to remove this completely
and have this read from the configuration. I think it would be better to provide a default
property in a config file for this. This was in place in the code currently checked in and
I did not tackle that in this patch. Would it be OK if I file a separate jira for this?

Wrt. TimelineWriterUtils.java re-use of encoded separators.
Hmm, good point. We use different separators for different situations (sometimes only encoding
space, sometimes space and "!", sometimes space and "?").
Let me ponder to see if I can remove the encode method and use a join method instead. Columns
are comprised of encoded parts (only some parts need to be encoded). I can see if I can make
that happen without totally bastardizing the structure.
I could also do a check first to see if (token.contains(separator)) but that might end up
being more expensive than encoding the token.
In the old code we always replaced spaces and separators with underscores, now we do it only
for those parts that are really needed. I got to think about that a bit more.

Uploaded a new patch with the other things fixed.

Unit tests completed and verified entity read back.
Still think it is useful to add a read method that reads an entire entity back and have a
assertsEqual(Entity a, Entity b) method somewhere and perhaps some additional read/write tests
for edge cases (for example, entities with individual fields).
I can add them in this patch, or in a separate one.

> Generalize native HBase writer for additional tables
> ----------------------------------------------------
>                 Key: YARN-3706
>                 URL: https://issues.apache.org/jira/browse/YARN-3706
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Joep Rottinghuis
>            Assignee: Joep Rottinghuis
>            Priority: Minor
>         Attachments: YARN-3706-YARN-2928.001.patch, YARN-3726-YARN-2928.002.patch, YARN-3726-YARN-2928.003.patch,
YARN-3726-YARN-2928.004.patch, YARN-3726-YARN-2928.005.patch, YARN-3726-YARN-2928.006.patch
> When reviewing YARN-3411 we noticed that we could change the class hierarchy a little
in order to accommodate additional tables easily.
> In order to get ready for benchmark testing we left the original layout in place, as
performance would not be impacted by the code hierarchy.
> Here is a separate jira to address the hierarchy.

This message was sent by Atlassian JIRA

View raw message