spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From trung kien <kient...@gmail.com>
Subject Re: Correct way to use spark streaming with apache zeppelin
Date Sat, 12 Mar 2016 10:58:26 GMT
Thanks Chris and Mich for replying.

Sorry for not explaining my problem clearly.  Yes i am talking about a
flexibke dashboard when mention Zeppelin.

Here is the problem i am having:

I am running a comercial website where we selle many products and we have
many branchs in many place. We have a lots of realtime transactions and
want to anaylyze it in realtime.

We dont want every time doing analytics we have to aggregate every single
transactions ( each transaction have BranchID, ProductID, Qty, Price). So,
we maintain intermediate data which contains : BranchID, ProducrID,
totalQty, totalDollar

Ideally, we have 2 tables:
   Transaction ( BranchID, ProducrID, Qty, Price, Timestamp)

And intermediate table Stats is just sum of every transaction group by
BranchID and ProductID( i am using Sparkstreaming to calculate this table
realtime)

My thinking is that doing statistics ( realtime dashboard)  on Stats table
is much easier, this table is also not enough for maintain.

I'm just wondering, whats the best way to store Stats table( a database or
parquet file?)
What exactly are you trying to do? Zeppelin is for interactive analysis of
a dataset. What do you mean "realtime analytics" -- do you mean build a
report or dashboard that automatically updates as new data comes in?


--
Chris Miller

On Sat, Mar 12, 2016 at 3:13 PM, trung kien <kientt86@gmail.com> wrote:

> Hi all,
>
> I've just viewed some Zeppenlin's videos. The intergration between
> Zeppenlin and Spark is really amazing and i want to use it for my
> application.
>
> In my app, i will have a Spark streaming app to do some basic realtime
> aggregation ( intermediate data). Then i want to use Zeppenlin to do some
> realtime analytics on the intermediate data.
>
> My question is what's the most efficient storage engine to store realtime
> intermediate data? Is parquet file somewhere is suitable?
>

Mime
View raw message