lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Haughton <timhaugh...@gmail.com>
Subject Re: [Lucene.Net] Indexing Oddity
Date Mon, 26 Sep 2011 12:40:39 GMT
internal static void AddToContentIndex(EDMDocument document, string
fullText)
        {
            lock (contentMutex)
            {
                IndexWriter writer = null;
                try
                {
                    EnsureContentIndexIsUnlocked();

                    // Add content
                    var contentIndexFolder = new
FileInfo(App.ContentIndexFolder);
                    writer = new IndexWriter(contentIndexFolder, new
StandardAnalyzer(), false);
                    writer.SetUseCompoundFile(true);

                    var contentDoc = new Document();
                    contentDoc.Add(new Field("content", fullText,
Field.Store.NO, Field.Index.TOKENIZED));
                    contentDoc.Add(new Field("documentID",
document.DocumentID, Field.Store.YES,
                                             Field.Index.UN_TOKENIZED));

                    writer.AddDocument(contentDoc);
                    writer.Optimize();
                }
                catch (Exception exception)
                {
                    log.Error("Problem adding document to content index.",
exception);
                }
                finally
                {
                    if (writer != null)
                    {
                        writer.Close();
                    }
                }
            }
        }

Cheers,

Tim


On 26 September 2011 13:37, Itamar Syn-Hershko <itamar@code972.com> wrote:

> No, you are probably using KeywordAnalyzer
>
> What is your indexing code?
>
> On Mon, Sep 26, 2011 at 3:28 PM, Tim Haughton <timhaughton@gmail.com>
> wrote:
>
> > Hi, I'm trying to index a text file containing the following text:
> >
> > DNE,APLU,GB11/0290
> > DNE,CMDU,11-1431
> > DNE,EGLV,NO CONTRACT
> > DNE,HJSC,ANE112376
> > DNE,HLCU,NO CONTRACT
> > DNE,MAEU,547712
> > DNE,MOLU,NO CONTRACT
> > DNE,OOLU,AE115029
> >
> > It appears that each "line" is being indexed as one complete string,
> rather
> > than at least 3 terms. So if I search for "547712" I get no results. But
> if
> > I search for "DNE,MAEU,547712" I find the document. If I add a space
> after
> > each comma it indexes them as individual terms.
> >
> > Is this expected behaviour using the StandardAnalyzer?
> >
> > Cheers,
> >
> > Tim
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message