lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-2800) RemoveDuplicatesTokenFilterFactory can not remove the duplicated term
Date Fri, 07 Sep 2012 22:24:07 GMT

     [ https://issues.apache.org/jira/browse/SOLR-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-2800:
---------------------------

    Fix Version/s:     (was: 4.0)
         Assignee: Robert Muir

removing fixVersion=4.0 since there is no evidence that anyone is currently working on this
issue.

But also assigning to [~rcmuir] since if i'm understanding his comments, it seems he thinks
there is an easy win here, we just need a test case.
                
> RemoveDuplicatesTokenFilterFactory can not remove the duplicated term
> ---------------------------------------------------------------------
>
>                 Key: SOLR-2800
>                 URL: https://issues.apache.org/jira/browse/SOLR-2800
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.4
>         Environment: Windows
>            Reporter: Han Hui Wen 
>            Assignee: Robert Muir
>              Labels: RemoveDuplicatesTokenFilterFactory, Solr
>
> Using RemoveDuplicatesTokenFilterFactory can not remove the duplicated term.
> in http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_4/solr/core/src/java/org/apache/solr/analysis/RemoveDuplicatesTokenFilter.java?view=markup
> @Override
> 53 	public boolean incrementToken() throws IOException {
> 54 	while (input.incrementToken()) {
> 55 	final char term[] = termAttribute.buffer();
> 56 	final int length = termAttribute.length();
> 57 	final int posIncrement = posIncAttribute.getPositionIncrement();
> 58 	
> 59 	if (posIncrement > 0) {
> 60 	previous.clear();
> 61 	}
> 62 	
> 63 	boolean duplicate = (posIncrement == 0 && previous.contains(term, 0, length));
> 64 	
> 65 	// clone the term, and add to the set of seen terms.
> 66 	char saved[] = new char[length];
> 67 	System.arraycopy(term, 0, saved, 0, length);
> 68 	previous.add(saved);
> 69 	
> 70 	if (!duplicate) {
> 71 	return true;
> 72 	}
> 73 	}
> 74 	return false;
> 75 	}
> it should be like following:
> @Override
> public boolean incrementToken() throws IOException {
> 	while (input.incrementToken()) {
> 		final char term[] = termAttribute.buffer();
> 		final int length = termAttribute.length();
> 		final int posIncrement = posIncAttribute.getPositionIncrement();
> 		if (posIncrement > 0) {
> 			previous.clear();
> 		}
> 		boolean duplicate = (posIncrement == 0 && previous.contains(term, 0, length));
> 		 
> 		if(duplicate )
> 		{
> 		  return false;
> 		}
> 		else
> 		{
> 			// clone the term, and add to the set of seen terms.
> 			char saved[] = new char[length];
> 			System.arraycopy(term, 0, saved, 0, length);
> 			previous.add(saved);
> 		}
> 	}
> 	return true;
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message