hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
Date Sat, 09 May 2015 05:25:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536225#comment-14536225
] 

Sangjin Lee commented on YARN-3411:
-----------------------------------

Thanks [~vrushalic] for the latest patch. I took a quick look at it, and have a few comments.
I think we should fill in those a couple of to-do's and get a patch ready.

(TimelineServicePerformanceV2.java)
- you might need to update the patch to be based on the latest from the branch; e.g addTImeSeries()
-> addValues()

(pom.xml)
- it might be good to follow the standard practice (AFAIK) and specify the hbase dependency
version in hadoop-project/pom.xml

(CreateSchema.java)
- l.57: it's not necessary to add hbase-site.xml manually if you're creating an explicit HBaseConfiguration.
It does it by default
- l.58: the connection should be closed at the end
- We don't have to do this right now, but it might be good to expand this tool to be able
to drop the schema and recreate it (drop and create)

(HBaseTimelineWriterImpl.java)
- l.71: let's make this a debug logging statement (also with a isDebugEnabled() call)
- l.75: no need to override init() (the base implementation calls serviceInit() anyway)
- l.146: since you're getting both key and value, using entrySet() vs. keySet() is a little
more efficient (the same for other iterations)
- l.225: style nit: you can simply define String key and Object value inside the loop (there
is no real savings for reusing the variables):
{code}
for (Map.Entry<String, Object> entry : configs.entrySet()) {
  String key = entry.getKey();
  Object value = entry.getValue();
  ...
}
{code}
- l.244: you want to check the type of the metric (single value v. time series) first and
handle them differently, right?

(TimelineWriterUtils.java)
- l.123: it seems like Range should be a top level class rather than an inner class for EntityTableDetails;
it's shared by several classes already

> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>
>                 Key: YARN-3411
>                 URL: https://issues.apache.org/jira/browse/YARN-3411
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Vrushali C
>            Priority: Critical
>         Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt,
YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native HBase schema
for the write path. Such a schema does not exclude using Phoenix, especially for reads and
offline queries.
> Once we have basic implementations of both options, we could evaluate them in terms of
performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message