lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Male (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (LUCENE-3271) Move 'good' contrib/queries classes to Queries module
Date Fri, 01 Jul 2011 02:40:28 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058187#comment-13058187
] 

Chris Male edited comment on LUCENE-3271 at 7/1/11 2:39 AM:
------------------------------------------------------------

{quote}
bq. similar.* -> suggest module

Seems a little funky? I guess if we had a query-expansion module, I would think it belonged
there.
{quote}

MoreLikeThis makes suggestions :D But okay.  Do you have other thoughts for what could go
into a query-expansion module?  If so, then I'll go with it.  I just know that MLT doesn't
belong with the queries anymore.

{quote}
{quote}
FieldCacheRewriteMethod -> This doesn't belong in this contrib or the queries module. I
think we should push it to contrib/misc for the time being. It seems to have quite a few constraints
on when its useful. If indeed CONSTANT_SCORE_AUTO rewrite is better, then I dont see a purpose
for it.
{quote}

My vote would actually be to move this to src/test!  Yeah there are some scenarios where this
thing could be faster, but really I thought it was just a good way to add seek to the doctermsindex
termsenum. I do think it and its test would be a nice addition to src/test, if someone wants
to use it the can always snag it from there... its that expert.
{quote}

src/test it is.

{quote}
In my opinion, as a rewrite method (i think it would require 2, one for the variant that ignores
TF), we could get better performance out of this with cleaner code... in other words you would
just use ordinary FuzzyQuery and set this rewrite method for its scoring heuristic, or a BQ
of FuzzyQueries if you are doing the expansion thing
{quote}

So what are you suggesting? We could sandbox it for the time being (see my comments about
sandbox below).

{quote}
Finally, I wanted to say that its my opinion that we shouldn't put garbage in modules. Modules
should be treated like core I think.... yet at the same time I totally support efforts to
cleanup contrib, either removing sandy stuff or refactoring it where it belongs in a module.
{quote}

+1 to all this.  I'm going to do a code cleanup on each of the classes to goes into the module.
Test coverage will be looked into as well. At this stage I don't think any of the classes
I've suggested moving to would be deemed garbage.

{quote}
One option could to create a sandbox directory either under lucene (it contains src/java and
src/test but is totally an unorganized sandbox), or itself as a contrib temporarily (contrib/sandbox)
and take a look at contrib and move stuff thats good into modules, but toss all the 'odd things'
into this sandbox.
{quote}

I actually really like the idea of a sandbox.  I think for simplicity, its best to make it
a contrib.  That way we can easily get it up and running.  It also won't 'stain' anything
that isn't already stained.

As part of this work, I'll push the SlowCollated* stuff to the sandbox, along with FuzzyLikeThis.

      was (Author: cmale):
    {quote}
bq. similar.* -> suggest module

Seems a little funky? I guess if we had a query-expansion module, I would think it belonged
there.
{quote}

MoreLikeThis makes suggestions :D But okay.  Do you have other thoughts for what could go
into a query-expansion module?  If so, then I'll go with it.  I just know that MLT doesn't
belong with the queries anymore.

{quote}
{quote}
FieldCacheRewriteMethod -> This doesn't belong in this contrib or the queries module. I
think we should push it to contrib/misc for the time being. It seems to have quite a few constraints
on when its useful. If indeed CONSTANT_SCORE_AUTO rewrite is better, then I dont see a purpose
for it.
{quote}

My vote would actually be to move this to src/test!  Yeah there are some scenarios where this
thing could be faster, but really I thought it was just a good way to add seek to the doctermsindex
termsenum. I do think it and its test would be a nice addition to src/test, if someone wants
to use it the can always snag it from there... its that expert.
{quote}

src/test it is.

{quote}
In my opinion, as a rewrite method (i think it would require 2, one for the variant that ignores
TF), we could get better performance out of this with cleaner code... in other words you would
just use ordinary FuzzyQuery and set this rewrite method for its scoring heuristic, or a BQ
of FuzzyQueries if you are doing the expansion thing
{quote}

So what are you suggesting? We could sandbox it for the time being (see my comments about
sandbox below).

{quote}
Finally, I wanted to say that its my opinion that we shouldn't put garbage in modules. Modules
should be treated like core I think.... yet at the same time I totally support efforts to
cleanup contrib, either removing sandy stuff or refactoring it where it belongs in a module.
{quote}

+1 to all this.  I'm going to do a code cleanup on each of the classes to goes into the module.
Test coverage will be looked into as well. At this stage I don't think any of the classes
I've suggested moving to would be deemed garbage.

{code}
One option could to create a sandbox directory either under lucene (it contains src/java and
src/test but is totally an unorganized sandbox), or itself as a contrib temporarily (contrib/sandbox)
and take a look at contrib and move stuff thats good into modules, but toss all the 'odd things'
into this sandbox.
{code}

I actually really like the idea of a sandbox.  I think for simplicity, its best to make it
a contrib.  That way we can easily get it up and running.  It also won't 'stain' anything
that isn't already stained.

As part of this work, I'll push the SlowCollated* stuff to the sandbox, along with FuzzyLikeThis.
  
> Move 'good' contrib/queries classes to Queries module
> -----------------------------------------------------
>
>                 Key: LUCENE-3271
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3271
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Chris Male
>
> With the Queries module now filled with the FunctionQuery stuff, we should look at closing
down contrib/queries.  While not a huge contrib, it contains a number of pretty useful classes
and some that should go elsewhere.
> Heres my proposed plan:
> - similar.* -> suggest module
> - regex.* -> queries module
> - BooleanFilter -> queries module under .filters package
> - BoostingQuery -> queries module
> - ChainedFilter -> queries module under .filters package
> - DuplicateFilter -> queries module under .filters package
> - FieldCacheRewriteMethod -> This doesn't belong in this contrib or the queries module.
 I think we should push it to contrib/misc for the time being.  It seems to have quite a few
constraints on when its useful.  If indeed CONSTANT_SCORE_AUTO rewrite is better, then I dont
see a purpose for it.
> - FilterClause -> class inside BooleanFilter
> - FuzzyLikeThisQuery -> suggest module. This class seems a mess with its Similarity
hardcoded.  With all that said, it does seem to do what it claims and with some cleanup, it
could be good.
> - TermsFilter -> queries module under .filters package
> - SlowCollated* -> They can stay in the module till we have a better place to nuke
them.
> One of the implications of the above moves, is that the xml-query-parser, which supports
many of the queries, will need to have a dependency on the queries module.  But that seems
unavoidable at this stage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message