jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Datastore and garbage collection
Date Mon, 09 Mar 2009 18:55:26 GMT

On Mon, Mar 9, 2009 at 6:38 PM, Paco Avila <monkiki@gmail.com> wrote:
> Do you mean that GC only make sense if I delete documents from the
> repository?

Yes. I would even say that GC only makes sense if 1) you delete
significant amounts of documents from the repository and 2) you add
documents at an *exponential* rate that exceeds the growth in storage

> I don't think that never run GC and keep all the documents (deleted one
> included) is a good alternative in repositories with several GB of size
> and big documents.

It depends... For example, I currently shoot about 10GB of digital
photos per month. Roughly 20% of the shots are so bad (blurry, poor
composition, overexposed, etc.) that I discard them immediately. It
would take just a few mouse clicks or a simple cron script to free up
the disk space that those discarded images take. But the extra effort
simply isn't worth it, since I will most likely have at least doubled
my storage capacity before my current 500GB hard drive is even close
to being filled up. Even the fact that I will probably only ever
publish about 10% of my photos doesn't make much of a difference,
since it costs so little to never delete anything. And I never need to
worry about accidentally removing something.

If your application is for personal use and you produce less than 10GB
of data per month, then don't worry about garbage collection.

If your application is for enterprise use and your customer produces
less than 100GB-1TB data per month (depending on the size of the
enterprise), then don't worry about garbage collection.


Jukka Zitting

View raw message