lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ajay Lakhani" <lakhani.a...@googlemail.com>
Subject Re: Bug In IndexWriter.addDocument?
Date Tue, 08 Jul 2008 07:34:40 GMT
Dear Digy,
You cannot store the Filed value when using a TokenStream but can store the
term vector
For this you should create an instance of the Field in this manner:

Field Field1 = new Field("Field1", DummyTokenStream1, TermVector.YES);

Below is the code that should work.

public class Main2Class{
  Document Doc = new Document();

  DummyTokenStream DummyTokenStream1 = new DummyTokenStream();
  Field Field1 = new Field("Field1", DummyTokenStream1, TermVector.YES);

  DummyTokenStream DummyTokenStream2 = new DummyTokenStream();
  Field Field2 = new Field("Field1", DummyTokenStream2, TermVector.YES);

  public void Index() throws Exception {
    Doc.add(Field1);
    Doc.add(Field2);

    IndexWriter wr = new IndexWriter("testindex", new WhitespaceAnalyzer(),
true);

    for (int i = 0; i < 100; i++){
      PrepDoc();
      wr.addDocument(Doc);
    }
    wr.close();
  }

  void PrepDoc(){
    DummyTokenStream1.SetText("test1");
    Field1.setValue(DummyTokenStream1);
    DummyTokenStream2.SetText("test2");
    Field2.setValue(DummyTokenStream2);
  }

  public static void main(String[] args) throws Exception {
    Main2Class m = new Main2Class();
    m.Index();
  }
}

Cheers
Ajay

2008/7/8 Ajay Lakhani <lakhani.ajay@googlemail.com>:

> Dear Digy,
>
> To add on, I might think that this is not a glitch.
>
> A TokenStream is usually not stored.
> If you change your field attribute to *
> org.apache.lucene.document.Field.Store.NO *then there will be no issue.
>
> Developers, any thoughts on this!
>
> Cheers
> Ajay
>
> 2008/7/8 Ajay Lakhani <lakhani.ajay@googlemail.com>:
>
> Dear Digy,
>> As of Lucene 2.3, there are new setValue(...) methods that allow you to
>> change the value of a Field. However, there seems to be an issue with the
>> org.apache.lucene.index.FieldWriter.writeField(...) API that stores the
>> string value for the field, which happens to be null in the case of a TokenStream.
>>
>>
>> The org.apache.lucene.index.FieldWriter.writeField(...) API needs to be
>> changed to verify whether the Field Data is an instance of String, Reader or
>> a TokenStream and then retrieve the respective values. I shall patch this
>> soon.
>>
>> Is there a particular reason you are using a TokenStream ? I suggest you
>> set the text value directly to the Field: Field1.setValue("xxx");
>>
>> Moreover, it's best to create a single Document instance, then add
>> multiple Field instances to it, but hold onto these Field instances and
>> re-use them by changing their values for each added document. After the
>> document is added, you then directly change the Field values
>> (idField.setValue(...), etc), and then re-add your Document instance. You
>> cannot re-use a single Field instance within a Document, and, you should not
>> change a Field's value until the Document containing that Field has been
>> added to the index.
>>
>> 2008/7/8 Digy <digydigy@gmail.com>:
>>
>>  Hi all,
>>>
>>>
>>>
>>> I am a Lucene.Net user. Since I need a fast indexing in my current
>>> project I try to use Lucene 2.3.2 which I convert to .Net with IKVM(Since
>>> Lucene.Net is currently in v2.1) and I use the same instances of document
>>> and fields to gain some speed improvements.
>>>
>>>
>>>
>>> I use TokenStreams to set the value of fields.
>>>
>>>
>>>
>>> My problem is that I get NullPointerException in "addDocument".
>>>
>>>
>>>
>>> Exception in thread "main" java.lang.NullPointerException
>>>
>>>         at
>>> org.apache.lucene.store.IndexOutput.writeString(IndexOutput.java:99)
>>>
>>>         at
>>> org.apache.lucene.index.FieldsWriter.writeField(FieldsWriter.java:127)
>>>
>>>         at
>>> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1418)
>>>
>>>         at
>>> org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:1121)
>>>
>>>         at
>>> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2442)
>>>
>>>         at
>>> org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2424)
>>>
>>>         at
>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1464)
>>>
>>>         at
>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1442)
>>>
>>>         at MainClass.Test(MainClass.java:39)
>>>
>>>         at MainClass.main(MainClass.java:10)
>>>
>>>
>>>
>>> To show the same bug in Java I prepared a sample application (oh, that
>>> was hard since this is my second app. in java(first one was a "Hello World"
>>> app.))
>>>
>>>
>>>
>>> Is something wrong with my application or is it a bug in Lucene?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> DIGY
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *SampleCode:*
>>>
>>> *    public class **MainClass***
>>>
>>> *    {*
>>>
>>> *             *
>>>
>>> *        DummyTokenStream **DummyTokenStream1** = new
>>> DummyTokenStream();*
>>>
>>> *        DummyTokenStream **DummyTokenStream2** = new
>>> DummyTokenStream();*
>>>
>>> * *
>>>
>>> *       //use the same document&field instances for Indexing*
>>>
>>> *        org.apache.lucene.document.Document **Doc** = new
>>> org.apache.lucene.document.Document();*
>>>
>>> * *
>>>
>>> *        org.apache.lucene.document.Field **Field1** = new
>>> org.apache.lucene.document.Field("Field1", "",
>>> org.apache.lucene.document.Field.Store.YES,
>>> org.apache.lucene.document.Field.Index.TOKENIZED);*
>>>
>>> *        org.apache.lucene.document.Field **Field2** = new
>>> org.apache.lucene.document.Field("Field2", "",
>>> org.apache.lucene.document.Field.Store.YES,
>>> org.apache.lucene.document.Field.Index.TOKENIZED);*
>>>
>>> * *
>>>
>>> *        public **MainClass**()*
>>>
>>> *        {*
>>>
>>> *            Doc.add(Field1);*
>>>
>>> *            Doc.add(Field2);*
>>>
>>> *        }*
>>>
>>> * *
>>>
>>> * *
>>>
>>> *        public void Index() throws *
>>>
>>> *
>>> org.apache.lucene.index.CorruptIndexException,*
>>>
>>> *
>>> org.apache.lucene.store.LockObtainFailedException,*
>>>
>>> *                           java.io.IOException*
>>>
>>> *        {*
>>>
>>> *              System.out.println("Index Started"); *
>>>
>>> *             org.apache.lucene.index.IndexWriter wr = new
>>> org.apache.lucene.index.IndexWriter("testindex", new
>>> org.apache.lucene.analysis.WhitespaceAnalyzer(),true);*
>>>
>>> *            *
>>>
>>> *            for (int i = 0; i < 100; i++)*
>>>
>>> *            {*
>>>
>>> *                    PrepDoc();*
>>>
>>> *                    wr.addDocument(Doc);*
>>>
>>> *            }*
>>>
>>> *            wr.close();*
>>>
>>> *             System.out.println("Index Completed"); *
>>>
>>> *        }*
>>>
>>> * *
>>>
>>> *        **void PrepDoc()*
>>>
>>> *        {*
>>>
>>> *            DummyTokenStream1.SetText("test1"); //Set a new Text to
>>> Token Stream*
>>>
>>> *            Field1.setValue(DummyTokenStream1); //Set TokenStream to
>>> Field Value*
>>>
>>> * *
>>>
>>> * *
>>>
>>> *            DummyTokenStream2.SetText("test2"); //Set a new Text to
>>> Token Stream*
>>>
>>> *            Field2.setValue(DummyTokenStream2); //Set TokenStream to
>>> Field Value*
>>>
>>> *        }*
>>>
>>> * *
>>>
>>> *       public static void main(String[] args)  throws*
>>>
>>> *                    org.apache.lucene.index.CorruptIndexException,*
>>>
>>> *                    org.apache.lucene.store.LockObtainFailedException,*
>>>
>>> *                    java.io.IOException*
>>>
>>> *       {*
>>>
>>> *              MainClass m = new MainClass();*
>>>
>>> *              m.Index();*
>>>
>>> *       }*
>>>
>>> * *
>>>
>>> * *
>>>
>>> * *
>>>
>>> *             *
>>>
>>> *       public class **DummyTokenStream **extends
>>> org.apache.lucene.analysis.TokenStream*
>>>
>>> *       {*
>>>
>>> *              String Text = "";*
>>>
>>> *              boolean EndOfStream = false;*
>>>
>>> *              org.apache.lucene.analysis.Token Token = new
>>> org.apache.lucene.analysis.Token();*
>>>
>>> * *
>>>
>>> *             //return "Text" as the first token and null as the second*
>>>
>>> *             public org.apache.lucene.analysis.Token next()*
>>>
>>> *             {*
>>>
>>> *                    if (EndOfStream == false)*
>>>
>>> *                    {*
>>>
>>> *                           EndOfStream = true;*
>>>
>>> * *
>>>
>>> *                           Token.setTermText(Text);*
>>>
>>> *                           Token.setStartOffset(0);*
>>>
>>> *                           Token.setEndOffset(Text.length() - 1);*
>>>
>>> *                           Token.setTermLength(Text.length());*
>>>
>>> *                           return Token;*
>>>
>>> *                    }*
>>>
>>> *                    return null;*
>>>
>>> *             }*
>>>
>>> * *
>>>
>>> *             public void SetText(String Text)*
>>>
>>> *             {*
>>>
>>> *                    EndOfStream = false;*
>>>
>>> *                    this.Text = Text;*
>>>
>>> *             }*
>>>
>>> *       }*
>>>
>>> * *
>>>
>>> *    }*
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Mime
View raw message