camel-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Claus Ibsen" <claus.ib...@gmail.com>
Subject Re: Deprecation of file consumer timestamp
Date Sat, 29 Nov 2008 16:04:15 GMT
Hi

I am reworking the file component as the code needs to be polished to
be ready for new feature requests by end users.

Having my fingers on the keyboard and reworking the code I do think we
should consider letting the idempotent consumer EIP pattern having a
first class interface for consumers to implement to support idempotent
right out-of-the-box. This is convenient for both the file and ftp
consumers to avoid re-consuming already processed files.

Then we could allow very easy URI configuration for the file consumer
to enable the idempotent
from("file://inbox?idempotent=true").to("bean:processOrder");

So I am proposing to either
a) add a new interface in org.apache.camel to cater for this
b) move the existing interface MessageIdRepository to org.apache.camel
c) option b but renaming the interface to a better name, IdempotentRepository

Using the existing MessageIdRepository allows us to leverage existing
implementations such as the JpaMessageIdRepository so we can support a
persistent solution right out-of-the-box.


/Claus Ibsen
Apache Camel Committer
Blog: http://davsclaus.blogspot.com/



On Tue, Nov 25, 2008 at 7:24 AM,  <james.strachan@gmail.com> wrote:
> Btw unit testing - where you want to process all filed on startup -
> and never want to edit/delete them was the main motivation & use case
> for noop.
>
> We definitely need to support different strategies as there are many
> different use cases. Eg sometimes keeping a cache of all files
> processed won't scale due to huge number of files. Sometimes you want
> to process a file again if it is touched.
>
> I understand that sometimes timestamps are dodgy; but I would rather
> us support all use cases cleanly using different pluggable strategies
> than disable useful functionality (like testing! :-)
>
>
> On 19/11/2008, Gert Vanthienen <gert.vanthienen@skynet.be> wrote:
>> L.S.,
>>
>> It almost sounds as if we need two separate different strategies that
>> can be configured on the file endpoint:
>> - one to determine which files need to be processed (the basic one just
>> takes all the files in a directory but we can build additional ones that
>> use a storage mechanisms)
>> - another one (like we already have now) that determines what to do with
>> the file after a successful or failed exchange
>>
>> FWIW, I actually like the simple noop one for creating unit tests
>> because it allows you to just refer to the /src/test/resources folder in
>> your project instead of having to copy them to a work folder first.
>>
>> Regards,
>>
>> Gert
>>
>> Claus Ibsen wrote:
>>> Hi
>>>
>>> Oh I have thought that some end-users want FileConsumer to keep retry
>>> consuming the same filer over and over again if it could not be
>>> processed, so the postAction could have a 3rd option or we could have
>>> an option to set this feature (kinda like noop but only for when the
>>> file could not be processed)
>>>
>>>
>>>
>>> /Claus Ibsen
>>> Apache Camel Committer
>>> Blog: http://davsclaus.blogspot.com/
>>>
>>>
>>>
>>> On Wed, Nov 19, 2008 at 10:35 AM, Claus Ibsen <claus.ibsen@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> The store idea is good as it can be used for the idempotent consumer
>>>> as well so we can use it to persist as well, so it can survive
>>>> restarts. We need to allow it to be pluggable so users can use a
>>>> shared DB if they use grid, or maybe some of that fancy terracote
>>>> thing that distributes memory caches.
>>>>
>>>> But turning back to the file consumer. I really think the noop=true
>>>> options should be deprecated as well. The file is like an inbox where
>>>> if a file is dropped it is consumed once. After processing the file is
>>>> deleted or moved to another destination. Now with this "remember list"
>>>> we have a serious issue if the inbox receives file with the same name
>>>> but the content of the file is different. What if someone uploads a
>>>> file to a FTP server and the filename is always fixed (= the same).
>>>> Now we have a complex situation as we need to hash the file content to
>>>> be able to determine if the file is different, or not support it at
>>>> all.
>>>>
>>>> I am mostly keen to keep it simpler and as Hadrian said "keep it lean".
>>>>
>>>> So I am voting for:
>>>> a) to remove noop as wel
>>>> b) to always delete or move file after processing (we should support
>>>> moving files to a different folder if exchange failed)
>>>>
>>>> Ad b)
>>>> We should support moving files using different pattern depending on
>>>> - exchange OK
>>>> - exchange Failed
>>>> I have though about introducing some better URI options to express this
>>>>
>>>> Something along the lines of (think of better uri option names)
>>>> postAction=delete
>>>>
>>>> postAction=move
>>>> moveCompleteExpression=./done/${file:name}.bak
>>>> moveErrorExpression=./error/${date:now:yyyyMMdd}/${file:name}.error
>>>>
>>>> And we should have defaults as well, so if moveErrorExpression is
>>>> omitted it defaults to the completed move.
>>>>
>>>>
>>>> And then we could consider @deprecating all the other pre and postfix
>>>> URI option we have in favor of the power of the expression instead.
>>>>
>>>>
>>>>
>>>> But the list store is not wasted as we can use it for the idempotent
>>>> as well and for other areas.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> /Claus Ibsen
>>>> Apache Camel Committer
>>>> Blog: http://davsclaus.blogspot.com/
>>>>
>>>>
>>>>
>>>> On Wed, Nov 19, 2008 at 4:04 AM, Jon Anstey <janstey@gmail.com> wrote:
>>>>
>>>>> Hmmm... yeah, I like this suggestion. It may be just what we need here!
>>>>> Thanks!
>>>>>
>>>>> On Tue, Nov 18, 2008 at 4:11 PM, Gert Vanthienen
>>>>> <gert.vanthienen@skynet.be>wrote:
>>>>>
>>>>>
>>>>>> Jon,
>>>>>>
>>>>>> How about if we enhance the file consumer to keep track of files
that
>>>>>> have
>>>>>> already been processed instead of using a timestamp?  The timestamp
>>>>>> approach
>>>>>> is a bit error-prone (just touching the file by accident can set
it off
>>>>>> again).
>>>>>> If we provide multiple implementations for the storage mechanism
to
>>>>>> keep
>>>>>> this information, we can cover a lot of use cases (similar to the
>>>>>> message id
>>>>>> store for an idempotent consumer):
>>>>>> - an in-memory store for testing purposes
>>>>>> - a file-based implementation for basic production environments
>>>>>> - a database- or ldap-backed implementation for clustered environments,
>>>>>> where a file can arrive through multiple directories
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Gert
>>>>>>
>>>>>> Jon Anstey schreef:
>>>>>>
>>>>>>  The algorithm that checks whether a file should be consumed based
on
>>>>>>
>>>>>>> timestamp has been deprecated for a while now (see
>>>>>>> http://activemq.apache.org/camel/file.html). I've removed this
on my
>>>>>>> local
>>>>>>> branch only to realize that it introduces a bit of an ugly problem...
>>>>>>> essentially since files will be processed always (modified or
not) in
>>>>>>> the
>>>>>>> case of noop=true or if a fault has been set, the same file will
be
>>>>>>> processed over and over again... not good!
>>>>>>>
>>>>>>> The original intent of removing the timestamp checking was to
simplify
>>>>>>> the
>>>>>>> consumer. I think that in trying to get around this new issue
we may
>>>>>>> make
>>>>>>> it
>>>>>>> even more complicated!
>>>>>>>
>>>>>>> I'm wondering if there is a simple solution to this that I'm
just not
>>>>>>> seeing
>>>>>>> yet or if maybe this issue was discussed before...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> Cheers,
>>>>> Jon
>>>>>
>>>>> http://janstey.blogspot.com/
>>>>>
>>>>>
>>>
>>>
>>
>>
>
>
> --
> James
> -------
> http://macstrac.blogspot.com/
>
> Open Source Integration
> http://fusesource.com/
>

Mime
View raw message