apex-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <dashi...@yahoo.com>
Subject [Malhar] @since 3.5.0 - S3InputModule is broken or does not function as documented.
Date Thu, 24 Nov 2016 03:56:32 GMT
When using s3:// schema on amazon managed s3 bucket, the module attempts to retrieve the prefix
with a leading / character which amazonaws does not recognize. When under a debugger the leading
"/" is removed the module proceeds forward, but errors out downstream expecting a leading
Additionally, if the secret key generated by AWS contains / character, authentication will
break. If the / character is replaced with URI escape sequence %2F authentication will break.
The only way to pass auth in my case was to keep regenerating the keys until the secret key
produced was free of  / or : characters. I'm pretty sure this is not going to cut it in production.
Does anyone know what magic is called for to properly escape the offending characters from
the s3(n) URI with the format as proposed by the module developers ?  
Has anyone had any success using S3InputModule? I haven't deployed the app to the EMR cluster
yet, all tests are local accessing S3 buckets from outside of the amazon cloud. It seems the
authors have setup their unit tests with s3n:// schema. Is there a way to replicate the original
unit tests?
Side note, current implementation does not persist the list of processed files anywhere outside
of the running process. Nor does the logic allow for moving processed files into another bucket
or marking them as complete. Does anyone know what was the original design though, in terms
of protection against duplicate processing?
Any insights would be greatly appreciated!
-- David.
View raw message