lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <>
Subject Re: ReadTask and its hierarchy needs some house cleaning
Date Tue, 18 May 2010 09:43:46 GMT
> Yes, such algorithms will be affected, but not necessarily deleted. So if a
> WarmReader task is required, one can write it, but it doesn't need to extend
> SearchTask, or it can, but hard-code all the other properties to false.
> Though in most cases you can run SearchTask, w/ warm set to true and after
> warm has been done, queries will be executed.

Say you want to keep allowing doing only warm and also allowing doing search
with warm. Since each opens a reader, forcing search with warm to be made by
two tasks would mean you either can only use the shared reader (in which
case the reader will only be opened once) or each of them opens its own
reader - and then you are opening two readers, which is a noise in the perf

The other search lines look like they can be rewritten w/ rounds?

Making the package weaker...

Anyway, this shows a perfect example for my argument: Search, SearchTrav and
> SearchTravRet only differ by their config options, yet an entire class had
> to be created.

Extending to modify behavior is pretty common... in regular Java code
anonymous classes would do it for you (no need to create that class in a
file) just that in bm tasks are created by reflection, so a concrete class
is needed.

If such sequence Search tasks is so crucial to be allowed, we can have
> SearchTask accept parameters, like Search(trav=false,warm=true) etc. in
> addition to the static .alg ones. I think personally it's an overkill, but
> could be a nice addition. It will definitely allow the above tasks to run in
> sequence, right?

I actually like this, but slightly different, see below.

> Point is, if you have 6 attributes, you don't need to create 64 classes in
> order to execute any combination you may need.

Good point...

On one hand it is readable and useful to have a concrete class for a
concrete task.
On the other hand it doesn't make sense to have too many combinations.
But then who needs all of them? and the ones really needed can easily be

It seems to me that the suggested change is affecting the simplicity of
writing algs..?

... so, I like the suggestion Search(trav=false,warm=true), just that I'd do
it differently - Read(search,warm).
(Read and not Search because of the above reader's consideration.)
Putting the Read/Search away for now, you could modify PerfTask so that
setParams() would set properties, where properties are delimited by ',', and
for each property, if it contains '=' it is interpreted as name=value,
otherwise it is interpreted as name=true. With modification, an algorithm
having SearchTrav(1000) is modified to SearchTrav(ntrav=1000), all the
params parsing moves to PerfTask, and each task class need only know the
name of its properties, which should differ from those of its supers. So
SearchTravTask's setParams would modify to something like this:

  public void setParams(String params) {
    traversalSize =

This would also allow to get rid of supportsParams() - unused params would
be silently ignored.

What do you think?

> Shai
> On Tue, May 18, 2010 at 10:32 AM, Doron Cohen <> wrote:
>> How would this affect for example current micro-standard.alg?
>> In particular this part of it:
>> {code}
>>         ...
>>         { "WarmNewRdr" Warm > : 50
>>         { "SrchNewRdr" Search > : 500
>>         { "SrchTrvNewRdr" SearchTrav(1000) > : 300
>>         { "SrchTrvRetNewRdr" SearchTravRet(2000) > : 100
>>         ...
>> {code}
>> Proposed change gets rid of these tasks, right?
>> Doron
>> On Tue, May 18, 2010 at 10:06 AM, Shai Erera <> wrote:
>>> Hi
>>> I wanted to run a benchmark .alg which will take a Filter into account.
>>> However, ReadTask, which is the base for a variety of search related tasks,
>>> does not support a Filter. When I reviewed the class, to understand how I
>>> can easily add such Filter support, I discovered a whole set of classes
>>> which IMO are completely unnecessary. ReadTask defines some with*() methods,
>>> such as withSearch, withTraverse etc. and many classes override ReadTask
>>> just to return true/false in those methods. WarmTask for example, returns
>>> true in withWarm() and false otherwise, while SearchTask returns true in
>>> withSearch and false otherwise.
>>> This created a whole set of extensions that you either need to run in
>>> sequence (e.g. Warm, SearchWithCollector) or create your own extension just
>>> to get the right recipe for the operations to perform.
>>> I suggest we do the following changes:
>>> * Rename ReadTask to SearchTask -- that's because RT uses IndexSearcher,
>>> QueryMaker -- all that suggests it's about Searching and not Reading. It's
>>> only semantics, I know, but I think SearchTask is clearer than ReadTask
>>> * Get rid of all the with*() methods, and instead move to use properties:
>>> search.with.warm, search.with.traverse, search.with.collector etc.
>>> * Introduce protected createCollector, createFilter, createSort, for
>>> custom extensions
>>> * Create a completely new hierarchy for this task, throwing away
>>> everything that can be handled through properties only (like SearchTask,
>>> WarmTask etc.)
>>> If we do this, then extensions of the new SearchTask will need to ask
>>> themselves "do I want to search w/ a Collector/Filter/custom Sort?" and not
>>> "do I Warm to be executed?" The core operation behind this task is
>>> The rest are just settings, or configuration, as well
>>> as some added ops like warm, and traverse. If it makes sense, I can factor
>>> warm() and traverse() into their own protected methods, for extensions to
>>> override as well. It might make sense for warm because custom warms is
>>> something I'm sure will be needed.
>>> This will also allow running algorithms with rounds - different
>>> properties for different rounds.
>>> This approach does not prevent one from creating MySearchTask with
>>> pre-defined and hard-coded settings. But for many others, the question of
>>> which task to execute will go away - you execute SearchTask for the basic
>>> search operations, or w/ the default Collector/Sort, and you control it via
>>> properties. To create your own *SearchTask extension which hard-codes a
>>> recipe, you'll need access to all the do<OP> members, so I'll make them
>>> protected. But that's IMO is a rare requirement, than say running a search
>>> with warm + traverse, and you shouldn't be forced to create a ReadTask
>>> extension for that.
>>> What do you think?
>>> Shai

View raw message