Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D337D1867F for ; Sun, 6 Mar 2016 17:17:37 +0000 (UTC) Received: (qmail 30956 invoked by uid 500); 6 Mar 2016 17:17:34 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 30844 invoked by uid 500); 6 Mar 2016 17:17:34 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 30834 invoked by uid 99); 6 Mar 2016 17:17:34 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Mar 2016 17:17:34 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D2E6A1A02CC for ; Sun, 6 Mar 2016 17:17:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.486 X-Spam-Level: *** X-Spam-Status: No, score=3.486 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.972, URI_HEX=1.313] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id FhgYuv53iQtB for ; Sun, 6 Mar 2016 17:17:33 +0000 (UTC) Received: from mwork.nabble.com (mwork.nabble.com [162.253.133.43]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 5B4945F3F2 for ; Sun, 6 Mar 2016 17:17:32 +0000 (UTC) Received: from mben.nabble.com (unknown [162.253.133.72]) by mwork.nabble.com (Postfix) with ESMTP id 05A3F1A66820C for ; Sun, 6 Mar 2016 09:07:36 -0800 (PST) Date: Sun, 6 Mar 2016 10:17:31 -0700 (MST) From: Laumegui Deaulobi To: user@spark.apache.org Message-ID: <1457284651839-26412.post@n3.nabble.com> Subject: Is Spark right for us? MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Our problem space is survey analytics. Each survey comprises a set of questions, with each question having a set of possible answers. Survey fill-out tasks are sent to users, who have until a certain date to complete it. Based on these survey fill-outs, reports need to be generated. Each report deals with a subset of the survey fill-outs, and comprises a set of data points (average rating for question 1, min/max for question 2, etc.) We are dealing with rather large data sets - although reading the internet we get the impression that everyone is analyzing petabytes of data... Users: up to 100,000 Surveys: up to 100,000 Questions per survey: up to 100 Possible answers per question: up to 10 Survey fill-outs / user: up to 10 Reports: up to 100,000 Data points per report: up to 100 Data is currently stored in a relational database but a migration to a different kind of store is possible. The naive algorithm for report generation can be summed up as this: for each report to be generated { for each report data point to be calculated { calculate data point add data point to report } publish report } In order to deal with the upper limits of these values, we will need to distribute this algorithm to a compute / data cluster as much as possible. I've read about frameworks such as Apache Spark but also Hadoop, GridGain, HazelCast and several others, and am still confused as to how each of these can help us and how they fit together. Is Spark the right framework for us? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-right-for-us-tp26412.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org