streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (STREAMS-293) allow for missing metadata fields in streams-persist-hdfs
Date Thu, 19 Mar 2015 22:23:38 GMT

    [ https://issues.apache.org/jira/browse/STREAMS-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370242#comment-14370242
] 

ASF GitHub Bot commented on STREAMS-293:
----------------------------------------

Github user jfrazee commented on the pull request:

    https://github.com/apache/incubator-streams/pull/195#issuecomment-83785169
  
    :+1: This would be/is really helpful for processing document only streams on disk. 
    
    I would add though that it'd be cool if it was a little bit more flexible maybe having
the option of a user provided function or lambda to define how to process the files -- problem
for another day though.


> allow for missing metadata fields in streams-persist-hdfs
> ---------------------------------------------------------
>
>                 Key: STREAMS-293
>                 URL: https://issues.apache.org/jira/browse/STREAMS-293
>             Project: Streams
>          Issue Type: Improvement
>            Reporter: Steve Blackmon
>            Assignee: Steve Blackmon
>
> Currently streams-persist-hdfs writer creates (and reader expects) exactly four columns.
 this could be made much more flexible without too much effort.  
> Update reader to support additional use cases:
> a) file paths containing one json document per line
> b) file paths containing just id and json on each line, 
> c) file paths containing id timestamp and json document on each line
> Update writer support
> a) ids only
> b) ids and timestamp only
> c) ids timestamp and json only



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message