lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jamie Johnson (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (SOLR-2368) Improve extended dismax (edismax) parser
Date Thu, 17 Nov 2011 12:46:51 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152028#comment-13152028
] 

Jamie Johnson edited comment on SOLR-2368 at 11/17/11 12:46 PM:
----------------------------------------------------------------

I've found an issue with trailing && ||, mixed && || 

I've put together a fix to SolrPluginUtils which addresses this issue for me (took pieces
of Solr 874), but don't have time to patch the trunk with it.  I'm no regex guru so there
is probably a better way to handle some of this, but it worked for my use case.

{code:title=SolrPluginUtils.java|borderStyle=solid}
	// Pattern to detect operator(s) padded with whitespace on the right, and
	// dangling operator(s) at end of query:
	private final static Pattern PADDED_OP_PATTERN = Pattern.compile( "(-|\\+)\\p{Z}" );
	private final static String PADDED_OR_DANGLING_OP_PATTERN_REPL_STR = "$1";
	
	// Pattern to detect dangling operator(s) at end of query
	private final static Pattern DANGLING_OP_PATTERN =  Pattern.compile("\\s+[-+(\\&\\&)(\\|\\|)\\s]+$");
	private final static String DANGLING_OP_REPL_STR = "";
	
	// Pattern to detect consecutive + and/or - operators
	static final Pattern CONSECUTIVE_OP_PATTERN = Pattern.compile( "(-)-+|(\\+)\\++" );
	static final  String CONSECUTIVE_OP_PATTERN_REPL_STR = "$1$2";
	
	// Pattern to detect consecutive && and/or || operators
	static final Pattern CONSECUTIVE_OP_PATTERN2 = Pattern.compile( "(\\&\\&)\\&+|(\\|\\|)\\|+"
);
	
	// Pattern to detect mixed consecutive + and - operators:
	static final Pattern MIXED_OP_PATTERN = Pattern.compile( "[-+]*(?:-\\+|\\+-)[-+]*" );
	static final String MIXED_OP_PATTERN_REPL_STR = " ";
	
	// Pattern to detect mixed consecutive AND and OR operators
	static final Pattern MIXED_OP_PATTERN2 = Pattern.compile( "(\\|\\|\\&\\&)|(\\&\\&|\\|\\|)"
);
	
	public static CharSequence stripIllegalOperators(CharSequence s) {
		return MIXED_OP_PATTERN2.matcher(
                 MIXED_OP_PATTERN.matcher(
                   CONSECUTIVE_OP_PATTERN2.matcher(
                     CONSECUTIVE_OP_PATTERN.matcher(
                       PADDED_OP_PATTERN.matcher(
                         DANGLING_OP_PATTERN.matcher(
                           s
                         ).replaceAll(DANGLING_OP_REPL_STR)
					   ).replaceAll(PADDED_OR_DANGLING_OP_PATTERN_REPL_STR)
                     ).replaceAll(CONSECUTIVE_OP_PATTERN_REPL_STR)
				   ).replaceAll(CONSECUTIVE_OP_PATTERN_REPL_STR)
			     ).replaceAll(MIXED_OP_PATTERN_REPL_STR)
              ).replaceAll(MIXED_OP_PATTERN_REPL_STR);
	}
{code}
                
      was (Author: jej2003):
    I've found an issue with trailing && ||, mixed && || 

I've put together a fix to SolrPluginUtils which addresses this issue for me (took pieces
of Solr 874), but don't have time to patch the trunk with it.  I'm no regex guru so there
is probably a better way to handle some of this, but it worked for my use case.

[CODE]
	// Pattern to detect operator(s) padded with whitespace on the right, and
	// dangling operator(s) at end of query:
	private final static Pattern PADDED_OP_PATTERN = Pattern.compile( "(-|\\+)\\p{Z}" );
	private final static String PADDED_OR_DANGLING_OP_PATTERN_REPL_STR = "$1";
	
	// Pattern to detect dangling operator(s) at end of query
	private final static Pattern DANGLING_OP_PATTERN =  Pattern.compile("\\s+[-+(\\&\\&)(\\|\\|)\\s]+$");
	private final static String DANGLING_OP_REPL_STR = "";
	
	// Pattern to detect consecutive + and/or - operators
	static final Pattern CONSECUTIVE_OP_PATTERN = Pattern.compile( "(-)-+|(\\+)\\++" );
	static final  String CONSECUTIVE_OP_PATTERN_REPL_STR = "$1$2";
	
	// Pattern to detect consecutive && and/or || operators
	static final Pattern CONSECUTIVE_OP_PATTERN2 = Pattern.compile( "(\\&\\&)\\&+|(\\|\\|)\\|+"
);
	
	// Pattern to detect mixed consecutive + and - operators:
	static final Pattern MIXED_OP_PATTERN = Pattern.compile( "[-+]*(?:-\\+|\\+-)[-+]*" );
	static final String MIXED_OP_PATTERN_REPL_STR = " ";
	
	// Pattern to detect mixed consecutive AND and OR operators
	static final Pattern MIXED_OP_PATTERN2 = Pattern.compile( "(\\|\\|\\&\\&)|(\\&\\&|\\|\\|)"
);
	
	public static CharSequence stripIllegalOperators(CharSequence s) {
		return MIXED_OP_PATTERN2.matcher(
                 MIXED_OP_PATTERN.matcher(
                   CONSECUTIVE_OP_PATTERN2.matcher(
                     CONSECUTIVE_OP_PATTERN.matcher(
                       PADDED_OP_PATTERN.matcher(
                         DANGLING_OP_PATTERN.matcher(
                           s
                         ).replaceAll(DANGLING_OP_REPL_STR)
					   ).replaceAll(PADDED_OR_DANGLING_OP_PATTERN_REPL_STR)
                     ).replaceAll(CONSECUTIVE_OP_PATTERN_REPL_STR)
				   ).replaceAll(CONSECUTIVE_OP_PATTERN_REPL_STR)
			     ).replaceAll(MIXED_OP_PATTERN_REPL_STR)
              ).replaceAll(MIXED_OP_PATTERN_REPL_STR);
	}
[/CODE]
                  
> Improve extended dismax (edismax) parser
> ----------------------------------------
>
>                 Key: SOLR-2368
>                 URL: https://issues.apache.org/jira/browse/SOLR-2368
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Yonik Seeley
>
> Improve edismax and replace dismax once it has all of the needed features.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message