hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joep Rottinghuis (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-3706) Generalize native HBase writer for additional tables
Date Sat, 23 May 2015 08:26:17 GMT

     [ https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Joep Rottinghuis updated YARN-3706:
    Attachment: YARN-3706-YARN-2928.001.patch

Initial version of patch (YARN-3706-YARN-2928.001.patch)

This patch isn't anywhere in a shape to apply, because I have not yet properly setup my environment
with the proper HBase 1 or the branch etc.
Still wanted to upload the skeleton of the code to communicate intent.
Also does not yet include the needed changes to TimelineSchemaCreator nor to HBaseTimelineWriterImpl.
The structure of HBaseTimelineWriterImpl stays really close to what it is. Init will use EntityTable
create a type safe BufferedMutator.
Some of the classes that are new in HBase 1 are stubbed out in this code (imports from org.apache.hbase.stubbs
need to be changed to the real imports).
Apologies for the hackiness.

Ideas in this patch:
- Type parameters will prevent accidental passing of wrong mutator for different table to
a column,
  or the wrong column family to the wrong column. Compiler won't allow it.

- Tables are fully defined in their own class
- minimize TimelineEntitySchemaConstants
- renamed row key prefix to simply prefix as it is used for more than row keys
- Column and column prefix classes are as short as possible and named after table.
- Columns are fully qualified with column name.
- ColumnPrefix is similar to column, except during storage, a column qualifier needs to be
added. If NONE is chosen, then no prefix is used
(unit test needs to confirm join works properly).
- Keep API simple, just keep as few store methods as needed, no special number, String, Long
etc. storing. Caller simply converts lists etc to a string.
- Later more behavior can be added to particular columns if needed.
This means that for all those columns where no override is needed for timestamp, null is simply
passed in.
- Removed usage of Cell as it doesn't seem to be needed when the Put can do the same.
- Minimize ´╗┐TimelineWriterUtils to really simply util methods that can be unit tested w/o
actual HBase (standalone) cluster

- Additional tables should be really easy to add: simply copy EntityTable, modify some names
and type template. Copy EntityColumn and EntityColumnPrefix, modify the column names, string
literals etc.

- If needed it should be easy to wrap extra behavior in the buffering to collapse together
multiple puts with the same rowkey.
- If needed it should be easy to compress column values over certain trigger value and add
an additional prefix (for example x!) in front of the column.
reader code still needs to be added to ColumnImpl which would then have to unwrap these column
names and uncompress.

Initially I'm just looking for feedback on structure and approach with separation of table,
column family, column, and column prefixes from actual storage logic.

> Generalize native HBase writer for additional tables
> ----------------------------------------------------
>                 Key: YARN-3706
>                 URL: https://issues.apache.org/jira/browse/YARN-3706
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Joep Rottinghuis
>            Assignee: Joep Rottinghuis
>            Priority: Minor
>         Attachments: YARN-3706-YARN-2928.001.patch
> When reviewing YARN-3411 we noticed that we could change the class hierarchy a little
in order to accommodate additional tables easily.
> In order to get ready for benchmark testing we left the original layout in place, as
performance would not be impacted by the code hierarchy.
> Here is a separate jira to address the hierarchy.

This message was sent by Atlassian JIRA

View raw message