lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franklin Simmons <fsimm...@sccmediaserver.com>
Subject RE: Alternative to looping through Hits
Date Fri, 02 Oct 2009 16:08:15 GMT
You could try using TopFieldDocCollector, TopDocs and an extended FieldSelector.  String.Join
is fairly quick I think. This might be overkill though ;-)

...

Lucene.Net.Search.TopFieldDocCollector collector = new TopFieldDocCollector(reader, Sort.RELEVANCE,
max_hits);

search.Search(query, null, collector);

Lucene.Net.Search.TopDocs top_docs = collector.TopDocs();
string [] values = new string[top_docs.scoreDocs.Length];
MyFieldSelector field_selector = new MyFieldSelector("DocumentId");

for(int i = 0; i < values.Length; i++) 
{
      Lucene.Net.Search.ScoreDoc score_document = top_docs.scoreDocs[i];
      Lucene.Net.Documents.Document document = searcher.Doc(score_document.doc, field_selector);
      values[i] = document.GetFieldable("DocumentId").StringValue();   
}

string csv = String.Join(" ,",values);


...
class MyFieldSelector : Lucene.Net.Documents.FieldSelector
{
      string field_name;

	public MyFieldSelector(string field_name)
	{
		this.field_name = field_name;
	}

	public Lucene.Net.Documents.FieldSelectorResult Accept(string field_name)
      {
          if(this.field_name == field_name) return Lucene.Net.Documents.FieldSelectorResult.LOAD;
          return Lucene.Net.Documents.FieldSelectorResult.NO_LOAD;
      }
}

-----Original Message-----
From: Trevor Watson [mailto:twatson@datassimilate.com] 
Sent: Friday, October 02, 2009 10:40 AM
To: lucene-net-user@incubator.apache.org
Subject: Alternative to looping through Hits

I am currently attempting to create a comma separated list of IDs from a 
given Hits collection.

However, when we end up processing 6,000 or more hits, it takes 25-30 
seconds per collection.  I've been trying to find a faster way to change 
the search results to the comma separated list.  Do any of you have any 
advice?  Thanks in advance.

Trevor Watson


My current code looks like

Lucene.Net.Search.Searcher search = new 
Lucene.Net.Search.IndexSearcher(string.Format("c:\\sv_index\\" + 
jobId.ToString()));
            Lucene.Net.Search.Hits hits = search.Search(query);

            string docIds = "";
            totalDocuments = hits.Length();

           
          // Test #1
            Lucene.Net.Search.HitIterator hi = 
(Lucene.Net.Search.HitIterator)hits.Iterator();
            while (hi.MoveNext())
                docIds += 
((Lucene.Net.Search.Hit)hi.Current).GetDocument().GetField("DocumentId").StringValue() 
+ ", ";

          // Test #2
            for (int iCount = 0; iCount < totalDocuments; iCount++)
            {
                Lucene.Net.Documents.Document docHit = hits.Doc(iCount);

                docIds += docHit.GetField("DocumentId").StringValue() + 
", ";
            }

Mime
View raw message