spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emre Sevinç (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-3276) Provide a API to specify MIN_REMEMBER_DURATION for files to consider as input in streaming
Date Tue, 07 Apr 2015 14:37:12 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483174#comment-14483174
] 

Emre Sevinç edited comment on SPARK-3276 at 4/7/15 2:36 PM:
------------------------------------------------------------

[~srowen] would it be fine if I added a public API method on FileInputDStream class that takes
a single parameter (duration) and sets the value of {{MIN_REMEMBER_DURATION}} to that value?
And of course, at the same time changing MIN_REMEMBER_DURATION from a constant into a variable,
with a default value of 1 minute (that is the currently hard-coded value).

Or, as an alternative to achieve the similar effect: Create another Spark configuration property
(with a default value of 1 minute) and re-factor the code so that (the new) {{minRememberDuration}}
variable takes its value from that property.

Right now, I have no idea which of the above two approaches is more meaningful / idiomatic.
Any comments?


was (Author: emres):
[~srowen] would it be fine if I added a public API method on FileInputDStream class that takes
a single parameter (duration) and sets the value of MIN_REMEMBER_DURATION to that value? And
of course, at the same time changing MIN_REMEMBER_DURATION from a constant into a variable,
with a default value of 1 minute (that is the currently hard-coded value).

> Provide a API to specify MIN_REMEMBER_DURATION for files to consider as input in streaming
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3276
>                 URL: https://issues.apache.org/jira/browse/SPARK-3276
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.2.0
>            Reporter: Jack Hu
>            Priority: Minor
>
> Currently, only one API called textFileStream in StreamingContext to specify the text
file dstream, which ignores the old files always. On some times, the old files is still useful.
> Need a API to let user choose whether the old files need to be ingored or not .
> The API currently in StreamingContext:
> def textFileStream(directory: String): DStream[String] = {
>     fileStream[LongWritable, Text, TextInputFormat](directory).map(_._2.toString)
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message