spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tdas <>
Subject [GitHub] spark pull request: [SPARK-4999][Streaming] Change storeInBlockMan...
Date Tue, 06 Jan 2015 09:24:33 GMT
Github user tdas commented on the pull request:
    If you are using a window operations, then previous batches data may need to be access
multiple times. If we dont put the data in WAL back in memory, the system will have to read
the data multiple times from the WAL. That's going to be very slow, isnt it.
    A smarter thing to do is to figure (based on the transformations) whether the data is
going to required multiple times or not, and accordingly store the data in BM. Just turning
it blindly setting it to false will cause a performance regression.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message