nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Skora <jsk...@gmail.com>
Subject Re: How to reject S3 Writes if folder does not exist?
Date Mon, 20 Mar 2017 13:53:18 GMT
Just to clarify, PutS3Object does not force the creation of a directory, it
just uploads to the requested "Object Key".  Technically, there are no
"directories" in S3, it is just a flat object store where buckets holds
objects.  This is why PutS3Object has properties for "Bucket" and "Object
Key" but not path or directory.

The notion of a hierarchical directory structure is superimposed by the S3
web GUI (and possibly other tools) such that a directory "projectX/" will
be shown if there is any "Object Key" stored that equals or begins with
"projectX/", such as "projectX/file1.txt".  In fact, if
"projectX/file1.txt" exists as a text file it is still possible to upload
another document as "projectX/file1.txt/logo.png" even though that violates
the rules of most hierarchical file systems since that implies a directory
and file with the same path.

Adding logic to confirm the existence of the directory structure would
create artificial constraints not required by S3, add complexity, and
require S3 requests and possibly state storage that are otherwise not
needed to store the object.

I hope that helps.

Regards,
Joe

On Fri, Mar 17, 2017 at 9:55 PM, James McMahon <jsmcmahon3@gmail.com> wrote:

> Thank you Adam and James. This has been very helpful, and gives me a
> number of options to explore. I am all set, thanks again for your help! -Jim
>
> On Fri, Mar 17, 2017 at 5:33 PM, Adam Lamar <adamonduty@gmail.com> wrote:
>
>> Jim,
>>
>> Absolutely that's one way. Depending on how many directories you have,
>> you can also do it directly with RouteOnAttribute and the expression
>> language:
>>
>> Property name: s3exists
>> Property value: ${outputTarget:equals('foo'):or(outputTarget:equals('
>> bar'))}
>>
>> Then route the s3exists relationship to PutS3Object.
>>
>> The python script strategy you mentioned may be good for a small to
>> medium number of directories.
>>
>> The ListS3 strategy mentioned by James might be a better fit if the list
>> is too large to easily maintain by hand.
>>
>> Hope that helps,
>> Adam
>>
>>
>> On Fri, Mar 17, 2017 at 3:07 PM, James McMahon <jsmcmahon3@gmail.com>
>> wrote:
>>
>>> So keep my list in a python script dictionary called by an ExecuteScript
>>> processor, and toss my outputTarget value against that. Set a new attribute
>>> s3exists to true or false in my script based on that result, and then use
>>> RouteAttribute to direct the output. Is that what you have in mind? -Jim
>>>
>>> On Fri, Mar 17, 2017 at 4:59 PM, Adam Lamar <adamonduty@gmail.com>
>>> wrote:
>>>
>>>> Jim,
>>>>
>>>> Also keep in mind that as an object store, S3 uses "directories" only
>>>> as a grouping concept, and not as a hierarchal storage mechanism. That's
>>>> why the initial PutS3Object doesn't fail with a new "directory". See
>>>> http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
>>>>
>>>> I think James' advice is spot on - to accomplish what you need, you'll
>>>> likely want to keep a list of known outputTargets in NiFi.
>>>>
>>>> Cheers,
>>>> Adam
>>>>
>>>
>>>
>>
>

Mime
View raw message