lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aditi Goyal" <aditigupt...@gmail.com>
Subject Re: java.lang.NullPointerExcpetion while indexing on linux
Date Thu, 21 Aug 2008 12:27:07 GMT
On Wed, Aug 20, 2008 at 6:12 PM, Michael McCandless <mail@mikemccandless.com
> wrote:

>
> Aditi Goyal wrote:
>
>  Thanks Mike. I found the problem.
>> The problem was that I was not converting the value of the fields to utf-8
>> and hence while adding it to doc it was getting stored as None.
>> So, when I did doc.get('fieldA') , instead of giving the blank or any
>> other
>> string, it was giving out None.
>>
>
> I don't really understand why failing to pre-convert to utf-8 would result
> in None being set -- is this a PyLucene (JCC) strangeness?
>
> It seems like if the incoming PyObject is a simple str, the C++ glue code
> generated by JCC should cast it to unicode before passing it to Java (and
> you shouldn't get null added on).
>
>  To overcome this, I first converted the string to utf-8 format and then
>> field.setValue() and then doc.add(field), It seems to be working fine.,
>>
>>
>> However, I have one question. When I do a feild.setValue() and then
>> doc.add() will it replace the value of the field in the doc or add a new
>> field with the similar name and the new value? Since i am reusing the doc
>> and i am not reinitialising the doc anywhere and since you told that
>> doc.removeField() is an expensive operation.
>>
>
> It replaces the value of that Field instance (not add a new field).

I checked this. In my index I was getting the fields with multiple duplicate
values. So Just doing field.setValue() would suffice. Doing doc.add() will
create duplicates of the same field in the document and it keeps on growing.


> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message