incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Couch as a mail store?
Date Thu, 12 Feb 2009 16:04:02 GMT

On 12 Feb 2009, at 16:51, Kenneth Kalmer wrote:
>
>> A deletion is effectively a set-deleted-flag operation. Compaction  
>> then
>> takes care of getting rid of the file.

> So taken from other threads, you're effectively tasked with running
> compaction outside of your peak time. This is a no brainer if the  
> other
> benefits are in reachable.

Right.


> While I'm here, can the docs still be recovered before compaction?  
> Why I ask
> is that it would be a bonus to be able to access the mails and do some
> statistical reporting before compacting the database, if not, no  
> issues.
> Mail server admins (and those footing the bills) love excessive  
> reporting...

They can, but you're still not advised to do it. If you need any data,  
it should be
in the latest version of a document. Reports could be run on the side  
and stored
into a secondary database.

>> Hmm, not too much information. Let's see, if you have any more  
>> specific
>> questions, just send a follow up :)
>>
>
> Well, lets try and keep this as close as couch as we can and not  
> wander off
> into the nasty world of email systems (except for effectively CRUD-ing
> messages).
>
> So mail arrives at our SMTP server. What would give us the best  
> performance
> for ingesting mail, directly writing each doc as it arrives, or  
> having small
> queues that empty out every X messages / Y seconds (whichever comes  
> first)?
> Considering one of our mail clouds does about 15GB an hour during  
> office
> hours. I know this size isn't anything when you consider larger  
> providers,
> but we're growing constantly and some time in the future we're gonna  
> have to
> become creative in how we store mail.

Single doc inserts get batch-witten to disk each second. This might  
work for your
data and memory requirements. But you can fine-tune that with custom  
queueing
and bulk doc inserts.


> Retrieving mail also becomes interesting, we can use one view to get  
> the
> total number of messages for the mailbox, and then another (with  
> parameters)
> to batch them from couchdb as we deliver them to the client. Would  
> bulk
> updates here be the cheapest way of "mark all as read" or "delete",  
> or would
> you again handle documents individually?

You can set the _deleted member in a bulk-update operation to delete a  
bunch
of docs in one go.

Cheers
Jan
--


Mime
View raw message