Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9CC8B42BB for ; Fri, 1 Jul 2011 02:41:00 +0000 (UTC) Received: (qmail 89294 invoked by uid 500); 1 Jul 2011 02:40:59 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 89036 invoked by uid 500); 1 Jul 2011 02:40:53 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 89025 invoked by uid 99); 1 Jul 2011 02:40:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jul 2011 02:40:50 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jul 2011 02:40:48 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 81FBF43D0BF for ; Fri, 1 Jul 2011 02:40:28 +0000 (UTC) Date: Fri, 1 Jul 2011 02:40:28 +0000 (UTC) From: "Chris Male (JIRA)" To: dev@lucene.apache.org Message-ID: <1909649133.7636.1309488028529.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1061334343.7576.1309482388854.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Issue Comment Edited] (LUCENE-3271) Move 'good' contrib/queries classes to Queries module MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058187#comment-13058187 ] Chris Male edited comment on LUCENE-3271 at 7/1/11 2:39 AM: ------------------------------------------------------------ {quote} bq. similar.* -> suggest module Seems a little funky? I guess if we had a query-expansion module, I would think it belonged there. {quote} MoreLikeThis makes suggestions :D But okay. Do you have other thoughts for what could go into a query-expansion module? If so, then I'll go with it. I just know that MLT doesn't belong with the queries anymore. {quote} {quote} FieldCacheRewriteMethod -> This doesn't belong in this contrib or the queries module. I think we should push it to contrib/misc for the time being. It seems to have quite a few constraints on when its useful. If indeed CONSTANT_SCORE_AUTO rewrite is better, then I dont see a purpose for it. {quote} My vote would actually be to move this to src/test! Yeah there are some scenarios where this thing could be faster, but really I thought it was just a good way to add seek to the doctermsindex termsenum. I do think it and its test would be a nice addition to src/test, if someone wants to use it the can always snag it from there... its that expert. {quote} src/test it is. {quote} In my opinion, as a rewrite method (i think it would require 2, one for the variant that ignores TF), we could get better performance out of this with cleaner code... in other words you would just use ordinary FuzzyQuery and set this rewrite method for its scoring heuristic, or a BQ of FuzzyQueries if you are doing the expansion thing {quote} So what are you suggesting? We could sandbox it for the time being (see my comments about sandbox below). {quote} Finally, I wanted to say that its my opinion that we shouldn't put garbage in modules. Modules should be treated like core I think.... yet at the same time I totally support efforts to cleanup contrib, either removing sandy stuff or refactoring it where it belongs in a module. {quote} +1 to all this. I'm going to do a code cleanup on each of the classes to goes into the module. Test coverage will be looked into as well. At this stage I don't think any of the classes I've suggested moving to would be deemed garbage. {quote} One option could to create a sandbox directory either under lucene (it contains src/java and src/test but is totally an unorganized sandbox), or itself as a contrib temporarily (contrib/sandbox) and take a look at contrib and move stuff thats good into modules, but toss all the 'odd things' into this sandbox. {quote} I actually really like the idea of a sandbox. I think for simplicity, its best to make it a contrib. That way we can easily get it up and running. It also won't 'stain' anything that isn't already stained. As part of this work, I'll push the SlowCollated* stuff to the sandbox, along with FuzzyLikeThis. was (Author: cmale): {quote} bq. similar.* -> suggest module Seems a little funky? I guess if we had a query-expansion module, I would think it belonged there. {quote} MoreLikeThis makes suggestions :D But okay. Do you have other thoughts for what could go into a query-expansion module? If so, then I'll go with it. I just know that MLT doesn't belong with the queries anymore. {quote} {quote} FieldCacheRewriteMethod -> This doesn't belong in this contrib or the queries module. I think we should push it to contrib/misc for the time being. It seems to have quite a few constraints on when its useful. If indeed CONSTANT_SCORE_AUTO rewrite is better, then I dont see a purpose for it. {quote} My vote would actually be to move this to src/test! Yeah there are some scenarios where this thing could be faster, but really I thought it was just a good way to add seek to the doctermsindex termsenum. I do think it and its test would be a nice addition to src/test, if someone wants to use it the can always snag it from there... its that expert. {quote} src/test it is. {quote} In my opinion, as a rewrite method (i think it would require 2, one for the variant that ignores TF), we could get better performance out of this with cleaner code... in other words you would just use ordinary FuzzyQuery and set this rewrite method for its scoring heuristic, or a BQ of FuzzyQueries if you are doing the expansion thing {quote} So what are you suggesting? We could sandbox it for the time being (see my comments about sandbox below). {quote} Finally, I wanted to say that its my opinion that we shouldn't put garbage in modules. Modules should be treated like core I think.... yet at the same time I totally support efforts to cleanup contrib, either removing sandy stuff or refactoring it where it belongs in a module. {quote} +1 to all this. I'm going to do a code cleanup on each of the classes to goes into the module. Test coverage will be looked into as well. At this stage I don't think any of the classes I've suggested moving to would be deemed garbage. {code} One option could to create a sandbox directory either under lucene (it contains src/java and src/test but is totally an unorganized sandbox), or itself as a contrib temporarily (contrib/sandbox) and take a look at contrib and move stuff thats good into modules, but toss all the 'odd things' into this sandbox. {code} I actually really like the idea of a sandbox. I think for simplicity, its best to make it a contrib. That way we can easily get it up and running. It also won't 'stain' anything that isn't already stained. As part of this work, I'll push the SlowCollated* stuff to the sandbox, along with FuzzyLikeThis. > Move 'good' contrib/queries classes to Queries module > ----------------------------------------------------- > > Key: LUCENE-3271 > URL: https://issues.apache.org/jira/browse/LUCENE-3271 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Chris Male > > With the Queries module now filled with the FunctionQuery stuff, we should look at closing down contrib/queries. While not a huge contrib, it contains a number of pretty useful classes and some that should go elsewhere. > Heres my proposed plan: > - similar.* -> suggest module > - regex.* -> queries module > - BooleanFilter -> queries module under .filters package > - BoostingQuery -> queries module > - ChainedFilter -> queries module under .filters package > - DuplicateFilter -> queries module under .filters package > - FieldCacheRewriteMethod -> This doesn't belong in this contrib or the queries module. I think we should push it to contrib/misc for the time being. It seems to have quite a few constraints on when its useful. If indeed CONSTANT_SCORE_AUTO rewrite is better, then I dont see a purpose for it. > - FilterClause -> class inside BooleanFilter > - FuzzyLikeThisQuery -> suggest module. This class seems a mess with its Similarity hardcoded. With all that said, it does seem to do what it claims and with some cleanup, it could be good. > - TermsFilter -> queries module under .filters package > - SlowCollated* -> They can stay in the module till we have a better place to nuke them. > One of the implications of the above moves, is that the xml-query-parser, which supports many of the queries, will need to have a dependency on the queries module. But that seems unavoidable at this stage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org