camel-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben O'Day (JIRA)" <>
Subject [jira] [Commented] (CAMEL-6867) camel-hdfs - HdfsProducer filename collisions when Producer instance recreated
Date Thu, 17 Oct 2013 21:16:44 GMT


Ben O'Day commented on CAMEL-6867:

the issue with using the messageId is that the connectOnStartup mode creates the initial file
stream on startup (no messageId to use in this case).  how about if we use the UUID generator
from the CamelContext like this: getEndpoint().getCamelContext().getUuidGenerator().generateUuid()?

also, any reason to continue to prepend the DEFAULT_SEGMENT_PREFIX with this new approach...the
prefix "seg" seems pretty arbitrary and should probably be configurable if we need to keep

> camel-hdfs - HdfsProducer filename collisions when Producer instance recreated
> ------------------------------------------------------------------------------
>                 Key: CAMEL-6867
>                 URL:
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-hdfs
>            Reporter: Ben O'Day
>            Assignee: Ben O'Day
>             Fix For: Future
> The HdfsProducer uses an instance variable (long splitNum) that is incremented to create
unique output filenames in a given directory (seg0, seg1, etc).  
> If the Producer instance is recreated (producer cache limit exceeded, server restart,
etc), the splitNum variable is reset to 0.  This results in files being overwritten when using
overwrite=true mode or throwing "The file already exists" errors when using overwrite=false
> We should switch to using a timestamp or some other unique generator to prevent filename
collisions regardless of the Producer instance lifecycle for the same hdfs directory URL...

This message was sent by Atlassian JIRA

View raw message