jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Meschberger <fmesc...@gmail.com>
Subject Re: Per-repository thread pool in Jackrabbit
Date Mon, 13 Jul 2009 12:57:38 GMT

Marcel Reutegger schrieb:
> Hi,
> 2009/7/12 Jukka Zitting <jukka.zitting@gmail.com>:
>> Hi,
>> 2009/7/8 Marcel Reutegger <marcel.reutegger@gmx.net>:
>>> - paralleled execution of some work. this is primarily to make use of
>>> multi-core processors. execution should be distributed over and
>>> executed by N threads which is a factor of the available processors.
>> If I recall correctly we debated this already earlier. My point was
>> that limiting the number of tasks to the number of available
>> processors may not be a good approach as the tasks may be IO-bound or
>> block for other reasons, in which case having more task threads would
>> give you better throughput. But I recall being proven wrong, did we
>> have some benchmark for that? Do you remember where this discussion
>> was?
> I don't remember either... But let's just start a new one.
> I think this very much depends on the work that needs to be distributed. there
> is no prove that one way is better than the other. for CPU intensive work we'd
> probably want to limit the number of concurrent tasks. for I/O intensive work
> the concurrency should be higher.
> my above point was rather related to CPU intensive work. e.g. creating a posting
> list while content is indexed. but of course there might be other work that may
> be parallelized more aggressively.
> I guess the actual pool shouldn't care about that. some utility on top
> of the pool
> should provide that functionality. i.e. execute a number of tasks with a given
> level of concurrency. the utility would then dispatch the tasks to the pool
> accordingly.
>>> - Timers used in TransactionContext and MultiIndex. This could be
>>> turned into a scheduling mechanism that could also be used by the
>>> ClusterNode sync. Other classes that use periodic checks in a
>>> background thread: DatabaseJournal (ClusterRevisionJanitor),
>>> CooperativeFileLock (watch dog).
>> Yep. Perhaps we could also reuse some of the scheduling functionality in Sling.
> I'm not sure this is needed. the java rt library already comes with
> Timer and Task
> classes. our needs are very simple and I'm not sure that justifies a
> new dependency.

Yes, AFAICT Java also has ThreadPool implementations. If not, I urge to
still _not_ reinvent the wheel and take something existing even if it
would a single dependency.


>>> the more I think about it, the more I like your idea. but we should be
>>> careful with a maximum size for a repository wide pool. extensive use
>>> of the pool by a module should not lock up another module just because
>>> there are no more idle threads. maybe that global pool shouldn't have
>>> a maximum size...
>> That might make sense. Perhaps we should have some concept of
>> sub-pools (that borrow from the main pool) with fixed limits for tasks
>> that need them (see above).
> hmm, that doesn't sound flexible and generic. I just thought again how cool
> it was if we could deploy jackrabbit into a google app-engine. that however
> requires that all background threads are removed. if we have that generic
> pool and client code adjusted accordingly it could be as easy as turning
> the pool into a direct executor variant ;) well, that's very optimistic but
> sounds promising to me...
> regards
>  marcel

View raw message