hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
Date Fri, 04 Sep 2015 22:30:46 GMT

     [ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sangjin Lee updated YARN-4074:
------------------------------
    Attachment: YARN-4074-YARN-2928.POC.003.patch

POC v.3 patch posted.

Key changes include
- switched from Get.setMaxResultSize() to PageFilter (more on that below)
- major refactoring of HBaseTimelineReaderImpl
-- introduced TimelineEntityReader and the hierarchy of classes to isolate proper reading
per type
- added unit tests to test HBaseTimelineReaderImpl for flow activity and flow runs
- fixed an issue with FlowScanner where the cells were returned in the wrong order so it was
breaking Column.readResult()
- made *RowKey classes real object classes, and added the parseRowKey method that returns
an instance of the RowKey
- fixed the order of the add and pollLast
- renamed FlowEntity to FlowRunEntity
- added the compareTo() method for FlowActivityEntity
- passed the type into the FlowActivityEntity constructor
- set configs for FlowActivityEntity and FlowRunEntity to null
- improved the way we get string values from info for FlowActivityEntity and FlowRunEntity
- added getNumberOfRuns() to FlowActivityEntity

It is actually pretty close to being ready, but since YARN-3901 is still outstanding, I'm
not making it an official patch yet.

As for the PageFilter issue, I concluded setMaxResultSize() is not the right API to use to
limit the number of rows. I believe the PageFilter is the right thing to use. I also added
the counting logic to get the right number of records even if the result iterator advances.

As for the FlowScanner issue mentioned above, [~vrushalic] and [~jrottinghuis] debugged this
to track down a bug in YARN-3901. As such, this change will likely be made in the final YARN-3901
patch. I just included it here for completeness and to make the unit code pass.

You should be able to apply the YARN-3901 v.3 patch and then this patch cleanly. Let me know
if you have any questions.

I'd greatly appreciate review feedback. I understand it's a lot of code...

> [timeline reader] implement support for querying for flows and flow runs
> ------------------------------------------------------------------------
>
>                 Key: YARN-4074
>                 URL: https://issues.apache.org/jira/browse/YARN-4074
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch,
YARN-4074-YARN-2928.POC.003.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as implementation
of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message