lucene-lucene-net-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENENET-183) SegmentTermVector IndexOf method always fails
Date Tue, 04 May 2010 17:08:56 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863890#action_12863890
] 

Digy commented on LUCENENET-183:
--------------------------------

Below is the mail from *Bernie Solomon*
{code}
Having hit the same problem I am puzzled why the fix for LUCENENET-183 seems to have got reverted.
Lucene.NET
 and lucene do not seem to be consistent as they currently are as the following short test
programs do different things. 
Java correctly has 1 for index and C# incorrectly prints -1.
 The proposed fix does address this. Am I missing something?

Thanks

Bernie

--- Java ---
import java.lang.*;
import java.io.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.store.*;

class Test
{
    public static void main(String [] args)
    {
        try
        {
            RAMDirectory directory = new RAMDirectory();
            Analyzer analyzer = new WhitespaceAnalyzer();
            IndexWriter writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.LIMITED);
            Document document = new Document();
            document.add(new Field("contents", new StringReader("a_ a0"), Field.TermVector.WITH_OFFSETS));
            writer.addDocument(document);
            IndexReader reader = writer.getReader();
            TermPositionVector tpv = (TermPositionVector)reader.getTermFreqVector(0, "contents");
            System.out.println("tpv: " + tpv);
            int index = tpv.indexOf("a_");
            System.out.println("index: " + index);
        }
        catch (Exception ex)
        {
        }
    }
}--- C# ---using System;
using System.IO;
using System.Text;
using Lucene.Net.Analysis;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.Store;

public class Test
{
    public static void Main(string [] args)
    {
        RAMDirectory directory = new RAMDirectory();
        Analyzer analyzer = new WhitespaceAnalyzer();
        IndexWriter writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.LIMITED);
        Document document = new Document();
        document.Add(new Field("contents", new StreamReader(new MemoryStream(Encoding.ASCII.GetBytes("a_
a0"))), Field.TermVector.WITH_OFFSETS));
        writer.AddDocument(document);
        IndexReader reader = writer.GetReader();
        TermPositionVector tpv = reader.GetTermFreqVector(0, "contents") as TermPositionVector;
        Console.WriteLine("tpv: " + tpv);
        int index = tpv.IndexOf("a_");
        Console.WriteLine("index: " + index);
    }
}
{code}

Thanks Bernie,

This patch is lost while porting 2.9.0
I recommitted the patch. (for 2.9.1 & 2.9.2 in trunk)
and added also a test case(your C# code) to avoid such loses.

DIGY




> SegmentTermVector IndexOf method always fails
> ---------------------------------------------
>
>                 Key: LUCENENET-183
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-183
>             Project: Lucene.Net
>          Issue Type: Bug
>            Reporter: Franklin Simmons
>         Attachments: SegmentTermVector-2.patch, SegmentTermVector.patch
>
>
> At index time term vectors are sorted using String.CompareOrdinal. However method IndexOf
of class SegmentTermVector invokes System.Array.BinarySearch, which is using String.Compare.
> {noformat}public virtual int IndexOf(System.String termText)
> {
> 	if (terms == null)
> 		return - 1;
>     int res = System.Array.BinarySearch(terms, termText);
> 	return res >= 0 ? res : - 1;
> }
> {noformat}
> The effect is that the IndexOf method always returns a negative number (no match) because
the sort order is incompatible with the default comparer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message