hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
Date Tue, 19 May 2015 16:15:02 GMT

    [ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550688#comment-14550688

Junping Du commented on YARN-3411:

bq. I was thinking it is not necessary since the entity information would come in a more streaming
fashion, one update at a time anyways. If say, one column is written and other is not, the
callee can retry again, hbase put will simply over-write existing value.
It sounds reasonable. Let's keep it so far and check if necessary to change for some corner
cases (like we could update event and metrics at the same time for some job's final counter)
in future.

Thanks for addressing my previous comments. Latest patch looks much closer! 
Some additional comments (besides Zhijie's comments above) on latest (006) patch:
In TimelineCollectorManager.java,
+  @Override
+  protected void serviceStop() throws Exception {
+    super.serviceStop();
+    if (writer != null) {
+      writer.close();
+    }
+  }
We should stop running collectors before stopping the shared writer. Also, put super.serviceStop();
to the last one to stop as conforming the common practice.

In EntityColumnFamilyDetails.java,
+   * @param rowKey
+   * @param entityTable
+   * @param inputValue
+   * @throws IOException
+   */
+  public void store(byte[] rowKey, BufferedMutator entityTable, String key,
+      String inputValue) throws IOException {
Lacking a param of key in javadoc.

In HBaseTimelineWriterImpl.java,
For write() which is synchronized semantics so far (we haven't implemented async yet), we
put each kind of entity to the table by calling entityTable.mutate(...) which cache these
entities locally and flush it later under some conditions. Do we need to call entityTable.flush()
for semantics of strict synchronized writing? If not, at least, we should flush it at serviceStop()
as close() directly could lose some cached data.

  protected void serviceStop() throws Exception {
    if (entityTable != null) {
      LOG.info("closing entity table");
    if (conn != null) {
      LOG.info("closing the hbase Connection");
Also, as comments above, put super.serviceStop() as the last one to stop.

In Range.java,
+public class Range {
For class marked as private, we don't necessary to put InterfaceStability annotation there.

In TimelineWriterUtils.java,

+    for (byte[] comp : components) {
+      finalSize += comp.length;
+    }
Null check for comp.

+      System.arraycopy(components[i], 0, buf, offset, components[i].length);
+      offset += components[i].length;
Null check for components[i].

+   * @param source
+   * @param separator
+   * @return byte[][] after splitting the input source
+   */
+  public static byte[][] split(byte[] source, byte[] separator, int limit) {
Missing param of limit in javadoc.

It sounds we didn't do any null check for separator in splitRanges(), but we do null check
in join(). It should keep consistent at least in one class.

+  public static long getValueAsLong(final byte[] key,
+      final Map<byte[], byte[]> taskValues) throws IOException {
+    byte[] value = taskValues.get(key);
+    if (value != null) {
+      Number val = (Number) GenericObjectMapper.read(value);
+      return val.longValue();
+    } else {
+      return 0L;
+    }
+  }
Shall we use Long instead of long? Or we cannot diff value with null and real 0.

> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>                 Key: YARN-3411
>                 URL: https://issues.apache.org/jira/browse/YARN-3411
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Vrushali C
>            Priority: Critical
>         Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch,
YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch,
YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt,
YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt
> There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native HBase schema
for the write path. Such a schema does not exclude using Phoenix, especially for reads and
offline queries.
> Once we have basic implementations of both options, we could evaluate them in terms of
performance, scalability, usability, etc. and make a call.

This message was sent by Atlassian JIRA

View raw message