hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers
Date Wed, 29 Jul 2015 01:49:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645348#comment-14645348
] 

Li Lu commented on YARN-3904:
-----------------------------

Hi [~sjlee0], thanks so much for the review! Some quick comments:

bq. Regarding PhoenixOfflineAggregatorWriterImpl, does it have to implement the TimelineWriter
interface?
It is no longer plugged into the real-time write path, and as such, implementing TimelineWriter
seems unnecessary.
That's actually exactly what I'm debating with myself! The more I'm working on the offline
aggregator, the more I was feeling that it is not really beneficial to implement our offline
storage as a {{TimelineWriter}}. However, the offline writer *is* actually a timeline writer.
The natural distinction between the Phoenix writer with the HBase writer is if a writer works
in the realtime or the offline workflow. Maybe we'd like to have something like {{TimelineRealTimeWriters}}
and {{TimelineOfflineWriters}} (or {{TimelineOfflineStorage}} to accommodate both read and
write code paths)? Realtime writers should focus on writing raw entity data with full context
info as well as performing realtime aggregations. Offline writers can focus on offline aggregation
storage. Thoughts?

bq. If we envision using this in a separate mechanism such as mapreduce, I think we ought
to come up with a new interface for aggregation.
Yes. If we're separating realtime and offline writers, we have more freedom to design aggregation-specific
writer interfaces. 

bq. Also, the actual work of reading the HBase tables (eventually the flow run table) and
invoking the offline aggregator is not captured here. 
I'm planning to include the HBase aggregation table reader as part of YARN-3817, if that POC
patch is not too big (so far I don't believe that's the case). Invoking the offline aggregator
may probably come separately since we may need some further changes in the RM to post active
flows. Does this plan work? 

> Refactor timelineservice.storage to add support to online and offline aggregation writers
> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-3904
>                 URL: https://issues.apache.org/jira/browse/YARN-3904
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>         Attachments: YARN-3904-YARN-2928.001.patch, YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch,
YARN-3904-YARN-2928.004.patch
>
>
> After we finished the design for time-based aggregation, we can adopt our existing Phoenix
storage into the storage of the aggregated data. In this JIRA, I'm proposing to refactor writers
to add support to aggregation writers. Offline aggregation writers typically has less contextual
information. We can distinguish these writers by special naming. We can also use CollectorContexts
to model all contextual information and use it in our writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message