apex-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Shah <amits...@gmail.com>
Subject Re: What-if analysis with apex
Date Thu, 28 Jan 2016 16:28:17 GMT
Please see my responses below

>1. Loading values for unmodified cells
> What is the source of these unmodified cells?


Table values. Taking an e.g. from the diagram, assuming the user modifies
cell with identifier (table 1, row 1, column 1) we would have to load
values for unmodified cells (table 2, row 2, column 2)  and (table 4, row
4, column 4) to recalculate the values of other cells

> 3. Execute the cells in parallel (if possible)
> Which cells you are referring to? Table1, row 1, column 1 - that is the
> cells that are changed will trigger dependent cells recalculation or the
> two dependent cells?


The modification of the cell with identifier (table 1, row 1, column 1)
would trigger recalculation of the cell values (table 3, row 3, column 3)
and (table 6, row 6, column 6). In this example we cannot do parallel
evaluations but you could imagine a case where there are parallel
calculations that could be possible.

Thanks,
Amit.

On Thu, Jan 28, 2016 at 9:20 PM, Sandeep Deshmukh <sandeep@datatorrent.com>
wrote:

> Thanks Amit. We have better understanding of your requirements now.
>
> It is not necessary that each cell will be one operator. Please don't get
> biased by that assumption.
>
> Here are few more queries.
> >1. Loading values for unmodified cells
> What is the source of these unmodified cells?
>
> > 3. Execute the cells in parallel (if possible)
> Which cells you are referring to? Table1, row 1, column 1 - that is the
> cells that are changed will trigger dependent cells recalculation or the
> two dependent cells?
>
> Regards
> Sandeep
> On 28-Jan-2016 8:20 pm, "Amit Shah" <amits.84@gmail.com> wrote:
>
>> Thanks Sandeep for the follow up. I have tried responding to your
>> queries. Kindly let me know if that gives you an idea on what I am trying
>> to achieve
>>
>> how you will be representing your dependencies in a graph
>>
>>
>> Attached a sample dependency graph. I was assuming each cell to be
>> represented as an operator in apex terms so that they could be executed in
>> parallel
>>
>> How many such dependency graphs will be there?
>>
>>
>> Total number of graphs would be approximately equal to the number of rows
>> that could be modified by the user (considering the worst case). The number
>> should be in 1000's.
>>
>> Do you have one graph per change of cell defining its dependent cells? So,
>>> for the example you mentioned, do you define it as O1 dependent cells into
>>> one graph? Then there is another graph which defines what values are
>>> updated if some other cell O7 is updated.
>>
>>
>> Yes approximately one graph per cell. The dependency graph I have tried
>> presenting in the attached diagram could be executed if any of the cell
>> values in table 1, 2 or 4 are updated. For simplicity I have picked up
>> cells from distinct tables.
>>
>> In my view, once the user sees the tables on the UI, we could create the
>> dependency graphs in the background. Once he/she updates a cell value, our
>> application would figure out its corresponding dependency graph and start
>> its execution by
>> 1. Loading values for unmodified cells
>> 2. Determine the cells (or operators) that are to be recalculated. For
>> e.g. if the cell with identifier as table1, row 1, column 1 is updated, the
>> application would determine that 2 cell values are to be updated.
>> 3. Execute the cells in parallel (if possible)
>> 4. Render the updated values in real time to the user.
>>
>> Thanks,
>> Amit.
>>
>> On Thu, Jan 28, 2016 at 7:28 PM, Sandeep Deshmukh <
>> sandeep@datatorrent.com> wrote:
>>
>>> Hi Amit,
>>>
>>> Your concern is that change of one cell is going to trigger update for
>>> large number of cells and you are interested in doing this in parallel to
>>> get real-time response. This can be very well achieved using Apex.
>>>
>>> I think we are still not very clear on your use case and hence what we
>>> have proposed may not fit match what you are looking for.
>>>
>>> We would like to know how you will be representing your dependencies in
>>> a graph. How many such dependency graphs will be there? Do you have one
>>> graph per change of cell defining its dependent cells? So, for the example
>>> you mentioned, do you define it as O1 dependent cells into one graph? Then
>>> there is another graph which defines what values are updated if some other
>>> cell O7 is updated.
>>>
>>> Once we fully understand your requirements, we should be able to guide
>>> you better.
>>>
>>>
>>> Regards,
>>> Sandeep
>>>
>>> On Thu, Jan 28, 2016 at 2:56 PM, Amit Shah <amits.84@gmail.com> wrote:
>>>
>>>> Ashwin, Below are follow up queries that I have based on your response.
>>>>
>>>> The store I mentioned is just an abstraction. It can be in memory
>>>>> store, or a cache backed lookup from a database.
>>>>
>>>>
>>>> Yes I understand by the term store but I didn't follow the need of it.
>>>>
>>>> How does your UI interact with your server today?
>>>>
>>>>
>>>> Our UI is built over angularjs so it communicates with the server
>>>> through REST api's.
>>>>
>>>> You dont have to create a new DAG for each cell you are changing. You
>>>>> can have a single DAG running and send across your query with the cell
>>>>> changes in the schema you define. You can perform all corresponding changes
>>>>> for other cells/table rows in the store operator.
>>>>
>>>>
>>>> I was under the impression that by defining one operator per column
>>>> index I could take the advantage of apex running individual operators on
>>>> individual jvm's and hence parallel writes in real-time or near real-time
>>>> response time. If we have single static DAG that accepts the cell
>>>> identiifer (row Id, column index and table id) as parameters then we would
>>>> not be able to concurrently updates cell values right?
>>>> If your understanding is different from the flow I explained in my
>>>> previous mail, what do I gain by using apex?
>>>>
>>>>
>>>> Thanks,
>>>> Amit.
>>>>
>>>>
>>>> On Thu, Jan 28, 2016 at 12:51 AM, Ashwin Chandra Putta <
>>>> ashwinchandrap@gmail.com> wrote:
>>>>
>>>>> Amit,
>>>>>
>>>>> The store I mentioned is just an abstraction. It can be in memory
>>>>> store, or a cache backed lookup from a database.
>>>>>
>>>>> For the query/query response, when interacting with a UI - you can
>>>>> send your queries to the query operator and listen for response from
the
>>>>> query response operator. Historically we have used json over websockets
to
>>>>> interact from browser. How does your UI interact with your server today?
>>>>>
>>>>> You dont have to create a new DAG for each cell you are changing. You
>>>>> can have a single DAG running and send across your query with the cell
>>>>> changes in the schema you define. You can perform all corresponding changes
>>>>> for other cells/table rows in the store operator.
>>>>>
>>>>> If you still want to depend completely on your existing server for
>>>>> loading initial data, then you can load it to a cache in store and do
your
>>>>> analysis on that data in memory.
>>>>>
>>>>> Regards,
>>>>> Ashwin.
>>>>>
>>>>> On Wed, Jan 27, 2016 at 7:42 AM, Amol Kekre <amol@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Amit,
>>>>>> Here are some answers
>>>>>> - Logic that you want to run can be coded as an utility, that is
then
>>>>>> invoked by any other operator
>>>>>> - PopulateDAG() is today part of roll out of the app, i.e it is
>>>>>> similar to "compileTime" and not "runTime". You could do runTime,
but then
>>>>>> you will need to go through dtcli. Today runTime changes via dtcli
will
>>>>>> need a lot more coding. A very early version of runTime changes (based
on
>>>>>> system metrics) exist, but the ask is for changes based on application
>>>>>> data. That ask is in the roadmap of module rollout (phase II?) and
others
>>>>>> can comment on the roadmap for runtTime populateDAG.
>>>>>> - Outputs of many operators can be streamed as input to one operator
>>>>>> in following ways
>>>>>>    - Each output having different schema will mean different input
>>>>>> ports on that operator as port schema is fixed. This is fine, but
will
>>>>>> clutter the DAG
>>>>>>    - If the schema of these output ports is same, there is a merge
>>>>>> operator that does that (
>>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java).
>>>>>> You can write one for Nx1 merge by extending the above class.
>>>>>>
>>>>>> Thks,
>>>>>> Amol
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 27, 2016 at 6:03 AM, Amit Shah <amits.84@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Ashwin for the follow up.
>>>>>>> I am not sure if I completely follow the query -> store ->
query
>>>>>>> pattern. What does query mean here? Why would we need a in-memory
store?
>>>>>>> Trying to list down the flow I came up with below points
>>>>>>>
>>>>>>>    1. We need to build a DAG after we get to know the cell (table,
>>>>>>>    row and column index) that is modified by the user.
>>>>>>>    2. Once we receive user input (i.e. once the user modifies
a
>>>>>>>    value in a table) the populateDAG() method should be called.
>>>>>>>    3. The populateDAG() implementation would
>>>>>>>    1. Determine what cells should be updated across all tables
>>>>>>>       2. Create an Operator per cell that is affected by the
>>>>>>>       change. From the demo code I see dag.addOperator method
>>>>>>>       instantiating an operator. Since the logic to update an
cell
>>>>>>>       would be the same across tables how do we create new operators
per cell to
>>>>>>>       have a graph that looks what Bhupesh envisioned in his
last email reply? In
>>>>>>>       my view the graph would like
>>>>>>>
>>>>>>>                     O1 (for user modified cell) -> O2 (table
X, row
>>>>>>> Y, column index 2) -> O5 (table E, row F, column index 10000)
>>>>>>>                                                             
  O3
>>>>>>> (table M, row N, column index 3)
>>>>>>>                       ->  O6 (update UI)
>>>>>>>                                                             
  O4
>>>>>>> (table P, row Q, column index 1)
>>>>>>>
>>>>>>>               3. We want the DAG to be evaluated instantly once
the
>>>>>>> populateDAG() method finishes. How do we do it?
>>>>>>>               4. Can outputs from many operators be streamed
as an
>>>>>>> input to one operator? From the above example outputs from O3,
O4
>>>>>>> and O5 need to go to O6.
>>>>>>>
>>>>>>> I appreciate your inputs on this.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Amit.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 27, 2016 at 1:49 PM, Ashwin Chandra Putta <
>>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>>
>>>>>>>> Amit,
>>>>>>>>
>>>>>>>> Thanks for the response. You can use the query --> store
--> query
>>>>>>>> result pattern to do the real time updates and lookups for
what-if analysis.
>>>>>>>>
>>>>>>>> And you can also ingest your real time input data to the
store
>>>>>>>> operator. input --> store.
>>>>>>>>
>>>>>>>> That way, you can keep ingesting your data into the store
operator
>>>>>>>> where you will keep your OLAP dimensions and measures.
>>>>>>>>
>>>>>>>> For the query/query result pattern example, see this demo:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ashwin.
>>>>>>>>
>>>>>>>> On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <amits.84@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Appreciate the discussion we are having on this topic.
>>>>>>>>>
>>>>>>>>> Bhupesh, If I understand the flow correctly, we would
have to
>>>>>>>>> define one DAG per cell in the table that could be modified
by the user.
>>>>>>>>> Given this, it would be right to define the DAG only
when the table is
>>>>>>>>> presented to the user on the UI (not at definition time
since there would
>>>>>>>>> be many tables). Would it be possible to define DAG at
runtime i.e.
>>>>>>>>> defining & wiring the operators at runtime?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ashwin, I am glad to answer these questions
>>>>>>>>>
>>>>>>>>> 1. We are extending our OLTP based application by introducing
>>>>>>>>> analytical features that includes what-if kind of analysis.
Other features
>>>>>>>>> do include performing OLAP kind of operations like aggregation,
slice &
>>>>>>>>> dice, drill down/up, pivoting. Our first milestone is
to target what-if
>>>>>>>>> kind of analysis. We don't have any implementation so
far. We are exploring
>>>>>>>>> out solutions to these requirements
>>>>>>>>> 2. The technical challenges we have include having an
in-memory
>>>>>>>>> calculation engine system that supports parallel writes
and provides real
>>>>>>>>> time or near real time response time.
>>>>>>>>>
>>>>>>>>> Hope that answers your queries.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Amit.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta
<
>>>>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Amit,
>>>>>>>>>>
>>>>>>>>>> I have a couple of questions if its not much.
>>>>>>>>>>
>>>>>>>>>> 1. What is the current implementation?
>>>>>>>>>> 2. What are the challenges you are facing?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ashwin.
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I am trying to evaluate apache apex for building
an application
>>>>>>>>>> that supports what-if analysis support to users.
This co-relates closed
>>>>>>>>>> with excel kind of functionality where changing a
value in one cell
>>>>>>>>>> triggers changes in other cell values. In our case
we would have multiple
>>>>>>>>>> rows in various tables getting updated when the user
changes a row value.
>>>>>>>>>> The response needs to be in real-time or near real-time.
>>>>>>>>>>
>>>>>>>>>> Does Apex fit such an use-case? If so, what would
be some of
>>>>>>>>>> initial steps to evaluate it for this use case?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ashwin.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Regards,
>>>>> Ashwin.
>>>>>
>>>>
>>>>
>>>
>>

Mime
View raw message