Return-Path: Delivered-To: apmail-activemq-users-archive@www.apache.org Received: (qmail 11423 invoked from network); 9 May 2008 06:56:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 May 2008 06:56:32 -0000 Received: (qmail 16448 invoked by uid 500); 9 May 2008 06:56:33 -0000 Delivered-To: apmail-activemq-users-archive@activemq.apache.org Received: (qmail 16432 invoked by uid 500); 9 May 2008 06:56:33 -0000 Mailing-List: contact users-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@activemq.apache.org Delivered-To: mailing list users@activemq.apache.org Received: (qmail 16421 invoked by uid 99); 9 May 2008 06:56:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 May 2008 23:56:33 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of james.strachan@gmail.com designates 64.233.178.240 as permitted sender) Received: from [64.233.178.240] (HELO hs-out-0708.google.com) (64.233.178.240) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 May 2008 06:55:38 +0000 Received: by hs-out-0708.google.com with SMTP id j58so707815hsj.6 for ; Thu, 08 May 2008 23:55:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=oe8UxnPM9eH0LvWxt0Uvmseo2iE3rJ4LSQOwE0bs07Y=; b=XtNoHx8yLI+gs63Jfd/bYcvbFyhn4MJr5CtAv/gSp2HMKUCDcSPYHcN2Eoju3+ELytQ/i0wQ0qBlB3Td/Vb/RVkSQwMqQXDII23/Y/BIU0dD4+YEjV6ZTd8sVeHbOzQOHPvAhpS5SK5Erfe2dLJ0T1GeXOwk6nvTfKjjS2tt3rE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=TTA5RtOALDO8CjfcFl0bVJ4MHICbudTzBC0t+FnSmZ5bsM999rI/R0DNm2Wu5GRVYoQqvgHDDoEk3nJ3qp02M2BeQx+pRDFTTWyGNBKP+Asn95DTOt+BSReiFDNdAaDMhBor9NpTcrtyHMjCS2ZKzdn4eLCiMB0ziC74CE8rsY4= Received: by 10.90.94.12 with SMTP id r12mr5969649agb.103.1210316159225; Thu, 08 May 2008 23:55:59 -0700 (PDT) Received: by 10.90.82.6 with HTTP; Thu, 8 May 2008 23:55:58 -0700 (PDT) Message-ID: Date: Fri, 9 May 2008 07:55:58 +0100 From: "James Strachan" To: users@activemq.apache.org Subject: Re: SMTP Server (Apache James) spooling hints In-Reply-To: <19113841.125261210236974250.JavaMail.root@elysia.void.it> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <19113841.125261210236974250.JavaMail.root@elysia.void.it> X-Virus-Checked: Checked by ClamAV on apache.org 2008/5/8 Stefano Bagnara : > Hi all, > > I'm an Apache JAMES committer and I'm "almost" new to ActiveMQ. Welcome :) > I'm starting analysis on how to replace our default spool with ActiveMQ and > I hope you can give me some hints :-) > It would be better to use ActiveMQ via JMS (more flexibility) but if there > is any better solution to our problems by using specific ActiveMQ APIs then > why not!! I'd be tempted to use the JMS API as (i) you can if you ever need to switch JMS providers and (ii) lots of the internal APIs to things like data stores & transaction logs and the like do change over time. Though maybe Camel is even easier (more in this later...) > Our scenario is an SMTP Server so we have something like this: > > 1) SMTP Server receives messages and put them to the spool. The spool have > to be persistent because once the message has been posted via SMTP we cannot > loose it. Most time the message will be consumed very fast, so in past I > looked at using Kaha directly for this, but maybe the 5.0 AMQ Message Store > already handle this one in a performant way? Yeah - I'd use the default persistence engine in ActiveMQ 5.x, the AMQ Store which is very fast... http://activemq.apache.org/amq-message-store.html basically just use the out-of-the-box config :) > 2) Our current spooling have this architecture: > we have a single "spool" that contains messages with a "state". We read a > random message from the spool, look at its state and then start the > processing depending on the state itself at the end of the processing we can > alter the state and leave the message in the spool, or we can remove it from > the spool. In the processing we could even push more messages into the spool > (e.g: to split the message to 2 different paths). ATM the re is no > transaction management. > The processing from a state to another (or to delete) is a sequence of > micro-processings (named matchers/mailets in james), so the actual status > depends also on what matchers/mailets have been processed so far, but we > currently keep this in memory and never store this. So if something goes > wrong (given that we don't have transactions) we simply start from the > beginning of that "state processor" (I'd like to improve this issue, too, > with the new ActiveMQ based spool). Using transactions is a good idea; then you can atomically process a number of messages and they are either processed or not in an ACID way. To improve performance you might wanna use batches; say processing 1000 messages in a single transaction; which means that most of the operations are all asynchronous & fast other than the transaction commit which does a sync-to-disk. http://activemq.apache.org/should-i-use-transactions.html > Some times the message is simply moved from one state to another a few > times and then it is removed from the spool because of 2 causes: > a) it has been moved to the "outgoing spool" (the spool for the messages to > be sent to other smtp servers) > b) it has been posted to an user inbox. > Other times the message is altered in its content. > So you see in James we currently have a single "message store" and we can > "lock on a message" (so no other thread will take it) "retrieve it", "update > and unlock it" (alter its state or state+content) or "remove it". How would > you manage this with ActiveMQ? With ActiveMQ you'd use a queue per state/maillet, remove it from the queue, do something with it then put it on some other queue(s) (either changed or the same message). The simple JMS/MOM model of sending to a queue or consuming from a queue turns out to be very fast; allowing a highly SEDA based asynchronous model to go really fast since there's no locking or leasing required - and messages can flow very asynchronously to boost throughput. If you do find you wanna grab - edit - put back type thing alot you could look at using JavaSpaces (or Entity Bean :). But I think for JAMES then messaging could work well as it sounds to me (as a newbie JAMES person) like what you're doing processing mail is kinda a pipes and filters type model... http://activemq.apache.org/camel/pipes-and-filters.html which maps very well to messaging and queues. For more background see : http://activemq.apache.org/camel/enterprise-integration-patterns.html btw you could maybe use Camel to describe how mail is routed from JAMES to different maillets & queues? Then you wouldn't have to worry about learning the JMS API (and we could switch to different spool implementations later on if need be). It'd also then make it easier to decide when to use queues. e.g. you might have 5 mailets; you could put each one of them on a queue; or rather than 5 writes to a queue you could invoke all 5 maillets in one go (in the same transaction) - or something in between. > 3) Outgoing spool: > The outgoing spool in JAMES is a spool like the main spool, with the > difference that a message delivery could fail and there is a retry schedule. > So we try to send a message, on failure we try again 10 minutes later, then > 30 minutes later, then 2 hours later (it is configurable) and so on. ATM we > store the "next-attempt-date" and then each "deliverer" simply take the > message with the minor next-attempt-date and if it is due for delivery it > starts its work, otherwise it will simply wait the needed time (one > deliverer is noticed when a *new* message enter this spool / They all "wait" > on the spool and the spool is noticed one at each store). > The most common case is: > a) the message we received at #1 entered the spool #2 and is processed very > fast and it ends in the outgoing spool #3 where it is delivered on the first > attempt. In this case it would be cool if the message was in memory and > simply written once for safety because the processing should be fast and it > would be slow to read it again from the disk. > b) we fail our first attempt, then it does not make sense to keep it in > memory because we know we won't need it in the next X minutes/hours. > Any suggestions on how to do this with ActiveMQ? It sounds like you could use the delayer pattern... http://activemq.apache.org/camel/delayer.html Then have separate queues for '30 mins later', '1 hour later', '2 hours later'. If delivery fails you send it to the next queue where messages are attempted to be delivered in order; but just X mins from the time they are added to the queue. Something kinda like this in pseudo camel code... from("activemq:outout.dispatch.attempt.1").bean(MyDispatchThingy.class); from("activemq:output.dispatch.attempt.2").delay(thirtyMins).bean(MyDispatchThingy.class); from("activemq:output.dispatch.attempt.3").delay(oneHour).bean(MyDispatchThingy.class); from("activemq:output.dispatch.attempt.4").delay(twoHours).bean(MyDispatchThingy.class); Then we'd just need to use the try/catch mechanism or a custom ErrorHandler http://activemq.apache.org/camel/error-handler.html so that if MyDispatchThingy fails to dispatch the message we dispatch it to the next queue in the list (or delete it if we're on attempt 4 etc). > As a last point we have to take care of 2 different use-cases: > I) most traffic is done by fastmoving small messages but The nice thing about the above is that you can then control concurrency on each one of the attempt queues. So you could have, say, 1000 threads doing attempt1, and 10 threads doing attempt2 and just one thread doing attempt 3 or 4 etc. > II) many messages are 1-10MB in size, and a few message could be even 100MB > or even more: how should we handle this messages in ActiveMQ given that we > can't take them in memory but we simply want to stream then in and out from > the server? JMS/MOM is designed for relatively modest messages as JMS clients and brokers try and keep messages around in RAM for maximum caching, performance and throughput. So you might wanna implement some kinda mechanism where messages over a certain size; say over 10MB use BlobMessages - that is to say out of band payloads... http://activemq.apache.org/blob-messages.html so you use JMS/ActiveMQ for the high performance reliable load balancing across a cluster of boxes; but keep the message payloads on some file system/JCR etc. Or maybe you try a middle ground where you keep the message headers in the JMS message but leave the body as a separate out of band entity; so you could use smart JMS routing using message headers. > I understand this is a lot of questions, but I would really appreciate any > hint, even partial. I'm collecting ideas :-) :) > Stefano > > PS: we are also evaluating using JCR for inboxes if you was wondering, but > this is another story, for another list ;-) You could store the mail in JCR and use messaging for the process flow. e.g. the JMS messages could just contain a reference (URL?) to the message payload. How often is the payload of the message mutated as it goes through maillets? If it remains kinda static and its more the headers, states & mailets that change mostly, it could be worth putting the payload in some file system / REST resource / JCR and just referring to the payload for large messages (say over 1-10MB)? If a message has to go through, say, 5 different steps that you might wanna load balance and cluster using different queues; it'd be painful to read/write a 100Mb email body for each 5 steps if the payload never changes through the 5 steps. -- James ------- http://macstrac.blogspot.com/ Open Source Integration http://open.iona.com