lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leonardo Azize Martins <laz...@gmail.com>
Subject Highlighting
Date Thu, 13 May 2010 22:10:14 GMT
Hello,

I am trying to use Highlight contrib, but my code is not good, how to handle
this?

See below my code.

1                           QueryParsers.QueryParser parser = new
QueryParsers.QueryParser(Util.Version.LUCENE_29, field, analyzer);
2                           Search.Query query =
parser.Parse(queryToSearch);
3                            SimpleHTMLFormatter formatter = new
SimpleHTMLFormatter();
4                            Search.TopDocs hits = searcher.Search(query,
10);
5                            for (int i = 0; i < hits.scoreDocs.Length; i++)
6                            {
7                                Search.ScoreDoc scoreDoc =
hits.scoreDocs[i];
8                                Documents.Document document =
searcher.Doc(scoreDoc.doc);
9                                string fullName = document.Get("id");
10                                string key =
IO.Path.GetExtension(fullName).ToLower();
11                                if (fileExtensions.ContainsKey(key))
12                                {
13                                    string text =
fileExtensions[key].Invoke(new IO.FileInfo(fullName));
14                                    string best = string.Empty;
15                                    Lucene.Net.Analysis.TokenStream
tokenStream = analyzer.TokenStream(field, new IO.StringReader(text));
16                                    {
17                                        QueryScorer scorer = new
QueryScorer(query, field);
18                                        Highlighter highlighter = new
Highlighter(formatter, scorer);
19                                        highlighter.SetTextFragmenter(new
SimpleFragmenter());
20                                        best =
highlighter.GetBestFragments(tokenStream, text, 3, "...");
21                                    }
22                                    tokenStream.Close();
23                                    result.AppendLine(string.Format("{0} -
{1}", fullName, document.Get("size")));
24                                    result.AppendLine(best);
25                                    result.AppendLine();
26                                }
27                            }
In line 13 I use a delegate to read original file text content, but is it a
good options? Is too slow.

In line 15 I use analyzer to get TokenStream object, I thing that it is
analyzing the text content as same as in IndexWriter.
But my fileds are:
Documents.Field id = new Documents.Field("id", item.FullName,
Documents.Field.Store.YES, Documents.Field.Index.NOT_ANALYZED_NO_NORMS);
Documents.Field contents = new Documents.Field("contents", plainText,
Documents.Field.Store.NO, Documents.Field.Index.ANALYZED,
Documents.Field.TermVector.WITH_POSITIONS_OFFSETS);
Documents.Field filename = new Documents.Field("filename", item.Name,
Documents.Field.Store.NO, Documents.Field.Index.NOT_ANALYZED_NO_NORMS);
Documents.Field extension = new Documents.Field("extension", item.Extension,
Documents.Field.Store.NO, Documents.Field.Index.NOT_ANALYZED_NO_NORMS);
Documents.NumericField size = new Documents.NumericField("size",
Documents.Field.Store.YES, true).SetLongValue(item.Length);
How can I use a code like this:
Lucene.Net.Analysis.TokenStream tokenStream =
document.GetField("contents").TokenStreamValue();
GetField return null;

In line 19 I use SimpleFragmenter, where are SimpleSpanFragmenter?

Regards,
Leo

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message