flex-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Harui <aha...@adobe.com>
Subject Re: Multithreading
Date Mon, 15 Aug 2016 14:20:48 GMT
IMO, there are still too many unknowns to give detailed advice.

First, whatever computer you use may or may not take 1 minute to do the
computation.  If you are purchasing Azure or AWS time, it could be faster
or even much slower.  You get what you pay for.  The cheapest Azure
single-core instance takes 12 times longer to build Flex than my multicore
Mac, and not because it has more cores, probably because it has more

Second, if only 10 city's data changes in a particular day, do you need to
recalculate 10 cites or maybe 100 cities that are near those 10 cities, or
all 1400?  Your code could have "good day/bad day" thresholds where it
knows only a few things changed but once 100 things change then it is time
to do a full recompute again.

Third, if the set of 1400 cities "never" changes, it may be advantageous
to pre-compute and cache the "nearest neighbors" calculation that seems to
be part of your loops.  Then, if one city changes, you know exactly which
other things to re-compute.  It might trigger other recompilations like a
spreadsheet does, but on average, that might be less processing than the
full loop.  IOW, having the right database and data structures can make a
huge difference.

Fourth, why pick 3 am?  If city data on the east coast cannot change after
midnight on the east coast, you can start crunching the number for those
cities then.

Fifth, simple sums and averages don't need to visit all historical data
when new data arrives.  A sum can cache the last sum and simply add the
new data. A simple average caches the number of entries and last average.

The last time we discussed your code, it was running 1 billion tests (40K
x 40K) on 200-300 lines of code.  If 1000 lines of code results in only
running 40K tests, that could save you quite a bit of processing time.
Heck, even 10,000 lines of code x 40,000 tests is potentially a huge


On 8/15/16, 12:25 AM, "bilbosax" <waspence41@comcast.net> wrote:

>Thanks for the reply Alex.  The app that I would write would not be that
>complex.  It would simply read about 40k records from a database with a
>single "GET * FROM main" , run the conditionals and distance calculations
>and sums/averages that we have discussed, and then write about 12k records
>back to a database table.  Probably no more than 200-300 lines of code.  A
>user would then be able to download the processed data from the database
>their desktop or mobile device AIR app very efficiently and do no
>on their device so that it is nice and speedy.  But this data has to be
>processed once a day for 1400 cities in the US.  If I was using a single
>computer to do the work linearly at 1 minute per run, it would take almost
>24 hours.  I need the data much faster than that.  I want to update every
>market in the US at 3am local time, so about 3-4 hours to process all of
>data from the East to West Coast. So if I was able to run 4 processes in
>parallel on a single machine without too much of a performance loss, it
>would finish in about 6 hours.  So two machines could do it in 3 hours.  I
>could use a scheduler to run the processes, and all would be good in the
>world. (By the way, I would only download updates from a service for each
>the 1400 cities to update my database, but once I have the updates, the
>entire dataset has to be reprocessed.)
>So if it is not possible to run 4 processes in parallel efficiently using
>I would either have to get an array of machines to do the work, or see how
>well something like Apache Spark could handle the workload.  So it
>boils down to this:  the speed of C plus the ability to run several
>processes in parallel VS the scaleability of Spark to process big
>Does that help at all?
>View this message in context:
>Sent from the Apache Flex Users mailing list archive at Nabble.com.

View raw message