Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 43DAA1088B for ; Wed, 4 Dec 2013 23:19:39 +0000 (UTC) Received: (qmail 76455 invoked by uid 500); 4 Dec 2013 23:19:39 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 76437 invoked by uid 500); 4 Dec 2013 23:19:39 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 76429 invoked by uid 99); 4 Dec 2013 23:19:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Dec 2013 23:19:39 +0000 Date: Wed, 4 Dec 2013 23:19:39 +0000 (UTC) From: "Jonathan Ellis (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-2527) Add ability to snapshot data as input to hadoop jobs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839472#comment-13839472 ] Jonathan Ellis commented on CASSANDRA-2527: ------------------------------------------- bq. Not really feasible; Hadoop is a special case since we can seq scan sstables without having to fully "open" them (sample indexes, populate key cache, etc) ... so, I'm not sure how interesting that leaves this given that we're trying to do predicate pushdown for Hadoop queries that could be indexed, for instance. > Add ability to snapshot data as input to hadoop jobs > ---------------------------------------------------- > > Key: CASSANDRA-2527 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2527 > Project: Cassandra > Issue Type: New Feature > Reporter: Jeremy Hanna > Assignee: Tyler Hobbs > Priority: Minor > Labels: hadoop > Fix For: 2.1 > > > It is desirable to have immutable inputs to hadoop jobs for the duration of the job. That way re-execution of individual tasks do not alter the output. One way to accomplish this would be to snapshot the data that is used as input to a job. -- This message was sent by Atlassian JIRA (v6.1#6144)