Return-Path: X-Original-To: apmail-spark-reviews-archive@minotaur.apache.org Delivered-To: apmail-spark-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B63BC1084E for ; Tue, 6 Jan 2015 09:24:34 +0000 (UTC) Received: (qmail 85302 invoked by uid 500); 6 Jan 2015 09:24:35 -0000 Delivered-To: apmail-spark-reviews-archive@spark.apache.org Received: (qmail 85282 invoked by uid 500); 6 Jan 2015 09:24:35 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 84833 invoked by uid 99); 6 Jan 2015 09:24:33 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jan 2015 09:24:33 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id 60CCC8C69B0; Tue, 6 Jan 2015 09:24:33 +0000 (UTC) From: tdas To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request: [SPARK-4999][Streaming] Change storeInBlockMan... Content-Type: text/plain Message-Id: <20150106092433.60CCC8C69B0@tyr.zones.apache.org> Date: Tue, 6 Jan 2015 09:24:33 +0000 (UTC) Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3906#issuecomment-68843161 If you are using a window operations, then previous batches data may need to be access multiple times. If we dont put the data in WAL back in memory, the system will have to read the data multiple times from the WAL. That's going to be very slow, isnt it. A smarter thing to do is to figure (based on the transformations) whether the data is going to required multiple times or not, and accordingly store the data in BM. Just turning it blindly setting it to false will cause a performance regression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org