streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Blackmon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (STREAMS-300) processor to fix handling of non-string fields from mongoexport
Date Mon, 23 Mar 2015 18:59:52 GMT
Steve Blackmon created STREAMS-300:
--------------------------------------

             Summary: processor to fix handling of non-string fields from mongoexport
                 Key: STREAMS-300
                 URL: https://issues.apache.org/jira/browse/STREAMS-300
             Project: Streams
          Issue Type: Improvement
            Reporter: Steve Blackmon


mongoexport is useful for producing files full of json documents which can be read by streams
in lieu of paging through documents in mongo.  however, there are some artifacts of the export
which much be cleaned up to reconstruct the original document.

specifically, dates and numbers show up as dictionaries instead of fields. for example:

    "created_at": {
        "$date": "2015-02-11T04:24:48.101+0000"
    }
    id": {
       "$numberLong": "2405068880"
    }

write a processor that can sit behind WebHdfsPersistReader and clean this up, such that mongoexport
-> WebHdfsPersistReader -> MongoExportCleanup -> downstream works equivalently to
MongoPersistReader -> downstream



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message