falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srikanth Sundarrajan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1102) Gather data transfer detail of replication job submitted from HDFS recipe
Date Fri, 12 Jun 2015 03:34:00 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582886#comment-14582886
] 

Srikanth Sundarrajan commented on FALCON-1102:
----------------------------------------------

Few observations with the patch.

* This is a very useful feature and should be enabled for standard replication as well.
* Besides BYTESCOPIED, BANDWIDTH, Time taken, num files etc are some useful counters. Likewise
if mechanism exists for this to be saved in the instance graph, we can figure how to open
this up for processes as well. (That of course can be taken up in a separate jira)

* Not sure if adding an index on bytescopied is useful
{code}
+        makeLongKey("BYTESCOPIED");
{code}

* Should this be proxied file system ?
{code}
+        FileSystem sourceFs = HadoopClientFactory.get().createProxiedFileSystem(
+                inPaths.get(0).toUri(), getConf());
{code}

* Eagerly deleting the counter file, might result in gaps if the graph update were to fail
for any reason. Can we let this be cleaned up along with the regular process log deletion.
Please confirm that it is indeed getting deleted.
{code}
+    private static void addCounterToWF(WorkflowExecutionContext executionContext) throws
FalconException {
+        FileSystem fs = HadoopClientFactory.get().createProxiedFileSystem(
+                new Path(executionContext.getLogDir()).toUri());
+        Path counterFilePath = getCounterFilePath(executionContext.getLogDir());
+        try {
+            if (fs.exists(counterFilePath)) {
+                String counters = readCounters(fs, counterFilePath);
+                if (!StringUtils.isEmpty(counters)) {
+                    executionContext.context.put(WorkflowExecutionArgs.COUNTERS, counters);
+                }
+            }
+        } catch (IOException e) {
+            throw new FalconException("Error in checking counter file :" + e);
+        } finally {
+            try {
+                if (fs.exists(counterFilePath)) {
+                    fs.delete(counterFilePath, false);
+                }
+                fs.close();
+            } catch (IOException e) {
+                LOG.error("unable to delete counter file: {}", e);
+            }
+        }
+    }
+
{code}
* Data type on Counter value should be long and not string
{code}
+    private void addCountersToInstance(String counter, Vertex vertex) {
+        int index = counter.indexOf(":");
+        String counterKey = counter.substring(0, index);
+        String counterValue = counter.substring(index+1, counter.length());
+        vertex.setProperty(counterKey, counterValue);
+    }
+
{code}

* Minor nits: There are some wildcard imports.

> Gather data transfer detail of replication job submitted from HDFS recipe
> -------------------------------------------------------------------------
>
>                 Key: FALCON-1102
>                 URL: https://issues.apache.org/jira/browse/FALCON-1102
>             Project: Falcon
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: Peeyush Bishnoi
>            Assignee: Peeyush Bishnoi
>             Fix For: 0.7
>
>         Attachments: FALCON-1102.patch
>
>
> Falcon UI has a requirement to show data transfer details from replication job invoked
through HDFS recipe. To carry out this, we need to capture the bytes transferred from source
to destination of replication job and then populate to backend store from where UI can access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message