spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: Largest input data set observed for Spark.
Date Thu, 20 Mar 2014 18:31:04 GMT
Understood of course.

Did the data fit comfortably in memory or did you experience memory
pressure?  I've had to do a fair amount of tuning when under memory
pressure in the past (0.7.x) and was hoping that the handling of this
scenario is improved in later Spark versions.


On Thu, Mar 20, 2014 at 11:28 AM, Reynold Xin <rxin@databricks.com> wrote:

> I'm not really at liberty to discuss details of the job. It involves some
> expensive aggregated statistics, and took 10 hours to complete (mostly
> bottlenecked by network & io).
>
>
>
>
>
> On Thu, Mar 20, 2014 at 11:12 AM, Surendranauth Hiraman <
> suren.hiraman@velos.io> wrote:
>
>> Reynold,
>>
>> How complex was that job (I guess in terms of number of transforms and
>> actions) and how long did that take to process?
>>
>> -Suren
>>
>>
>>
>> On Thu, Mar 20, 2014 at 2:08 PM, Reynold Xin <rxin@databricks.com> wrote:
>>
>> > Actually we just ran a job with 70TB+ compressed data on 28 worker
>> nodes -
>> > I didn't count the size of the uncompressed data, but I am guessing it
>> is
>> > somewhere between 200TB to 700TB.
>> >
>> >
>> >
>> > On Thu, Mar 20, 2014 at 12:23 AM, Usman Ghani <usman@platfora.com>
>> wrote:
>> >
>> > > All,
>> > > What is the largest input data set y'all have come across that has
>> been
>> > > successfully processed in production using spark. Ball park?
>> > >
>> >
>>
>>
>>
>> --
>>
>> SUREN HIRAMAN, VP TECHNOLOGY
>> Velos
>> Accelerating Machine Learning
>>
>> 440 NINTH AVENUE, 11TH FLOOR
>> NEW YORK, NY 10001
>> O: (917) 525-2466 ext. 105
>> F: 646.349.4063
>> E: suren.hiraman@v <suren.hiraman@sociocast.com>elos.io
>> W: www.velos.io
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message