hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3863) Enhance filters in TimelineReader
Date Sat, 12 Dec 2015 13:13:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054308#comment-15054308
] 

Varun Saxena commented on YARN-3863:
------------------------------------

[~sjlee0], [~djp], kindly review.
The latest WIP patch includes the following over 1st WIP patch.

# Relationships(relatesTo/isRelatedTo) and event filters represented as timeline filter list
in addition to info, config and metric filters. So that queries around ANDs' and ORs' can
be supported even for them.
# Events in entity and application table are represented as under:
The column qualifier is of the form {{e!\[eventid\]=\[event_timestamp\]=\{event info key\}}}
. Info key is part of column qualifier only if event info exists. The value associated with
column qualifier is info value. If no info exists, it will be empty.
Now to match event filters which check existence of a particular event, with this arrangement
we do not really have an analogous HBase filter which can filter out rows for us(with complex
filters containing ANDs' and ORs'). So event filters will be applied in timeline reader after
fetching rows from HBase.
What we can do however to reduce amount of data to fetch from HBase is that we fetch only
those columns which are required for matching event filters. This is what is done in the patch.
We use QualifierFilter to achieve this...
Pls note we do this only if fields to retrieve does not contain EVENTS. Because then all events
will have to be fetched.
# Now coming to relationships(isRelatedTo and relatesTo), they are stored as under :
Column qualifier is {{r!\[entitytype\]}} or {{s!\[entitytype\]}} and associated value is stored
as a list entity ids' separated by = i.e. like, {{entityid1=entityid2=entityid3}}
The way value is stored makes it difficult to use SingleColumnValueFilter. We can probably
use regex comparator but making regex dynamically based on query on the fly may not be feasible
and anyways make matching slow at HBase side.
So even here we fetch only the required columns like we do for event filters.

Also Naga told me that in the meeting you wanted reader API to be refactored as well.
I have that at the back of my mind. I think as this patch by itself is quite large, we can
do that refactoring in another JIRA. Or do you want to do it here ?
I have to raise a few JIRAs' including this refactoring one. Its at the back of my mind.

> Enhance filters in TimelineReader
> ---------------------------------
>
>                 Key: YARN-3863
>                 URL: https://issues.apache.org/jira/browse/YARN-3863
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: YARN-2928
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-3863-feature-YARN-2928.wip.003.patch, YARN-3863-feature-YARN-2928.wip.01.patch,
YARN-3863-feature-YARN-2928.wip.02.patch
>
>
> Currently filters in timeline reader will return an entity only if all the filter conditions
hold true i.e. only AND operation is supported. We can support OR operation for the filters
as well. Additionally as primary backend implementation is HBase, we can design our filters
in a manner, where they closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message