lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shad Storhaug <s...@shadstorhaug.com>
Subject RE: Lucene 4.8 - Reusing Document during indexing
Date Sun, 11 Jun 2017 08:00:53 GMT
Matt,

Since a field needs to keep track of both the value and the type, the field values are set
using methods that include type name.

luceneDoc.GetField( "text" ).SetStringValue( block.Text );

Setting the field value using a common SetValue function is something that was carefully considered,
but it would mean you would have to be extremely explicit when setting the correct type. For
example:

float value1 = 5.00000001;
string value2 = value1.ToString()

luceneDoc.GetField( "number" ).SetValue(value2);

object value3 = luceneDoc.GetField( "number" ).GetNumericValue();


The above code would produce an error because the field was originally set as a string, but
a float was expected to be stored. This would produce a bug that might be hard to track down,
where forcing the developer to think about what type they are trying to set (SetSingleValue)
makes it more explicit and less likely to go wrong, since it would produce a compile-time
error.


That said, an overloaded SetValue is more .NET-like and in this particular case we don't have
any duplicate types that would cause collisions so we could add an overloaded SetValue method
and convert the existing methods into extension methods in the Support namespace. I would
be interested in hearing any feedback on whether explicitly specifying the type in the method
name or explicitly casting to the correct type (as was the case in 3.0.3) is preferable. In
.NET, the overloaded methods don't normally all store the value in the same object variable
under the covers, so making explicit methods seems like a better choice to me.


On a side note, it looks like we should deprecate all of the FieldExtensions methods except
IsStored to make sure people are aware that they will not be available after Lucene.Net 4.8,
since the corresponding enumerations have been deprecated.

Thanks,
Shad Storhaug (NightOwl888)


-----Original Message-----
From: Matt Diehl [mailto:matt@gooddiehl.net.INVALID] 
Sent: Sunday, June 11, 2017 9:57 AM
To: user@lucenenet.apache.org
Subject: Lucene 4.8 - Reusing Document during indexing

Hi,

I am not understanding how to reuse Document like we could in 3.0.3 for indexing purposes.

For instance, in 3.0.3, I could create and then set several common Field values, and then
just iterate changing a single field in the Document, and add to index:

Document lucenedoc = createDocumentAndSetFileSpecificFields( file );

foreach ( var block in blocks )
{
        luceneDoc.GetField( "text" ).SetValue( block.Text );
        indexWriter.AddDocument( luceneDoc ); }

In 4.8, SetValue is not a function anymore, and it seems like I have to recreate my 8-field
Document every time I write to Index.

foreach ( var block in blocks )
{
    Document lucenedoc = createDocumentAndSetFileSpecificFields( file, block.Text );
    indexWriter.AddDocument( luceneDoc ); }

Can someone help me realize what I am missing?

Thanks,
Matt
Mime
View raw message