lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Farris <alfar...@prodigy.net>
Subject Binary stored fields (was Re: suggestions for a student project)
Date Fri, 28 May 2004 01:26:04 GMT
On Thu, 2004-05-27 at 17:34, Dmitry Serebrennikov wrote:
> Drew Farris wrote:
> >
> >I send out a patch for this soon after the original discussion...
> >
>
> Drew, I'm sorry I missed this. Definetely didn't mean to ignore you 
> work! I just assumed that this is still undone. Oops....

No worries, it might have helped for me to include [PATCH] in the
subject line as the Jakarta guidelines suggest :) Thanks for the good
words and taking the time to look at it now -- I will be glad if I can
contribute something of value. Comments follow inline.

> I just took a look at your patch. Looks great! Simple, and does the trick.
> One comment:
> - I understand why getBinaryValues(String fieldName) has to return 
> byte[][]. This is to deal with multiple fields with the same name, 
> right? 

Yep.

> I think this is a relatively rare case, so perhaps it would be 
> good to have a method getBinaryValue(String fieldName) that just returns 
> the first byte[] for a given name. I think you can avoid allocating an 
> array and most applications would find it more convinient.

Sounds great, I'll implement byte[] getBinaryValue(String fieldName)
which returns the first binary field of the given name encountered in
the results from getFields(fieldName);

> As far as testing, I think unit tests are enough. It would be good to 
> have a few tests though. One test to have as with a large amount of data.

Ok. I've updated DocHelper, TestFieldInfos, TestSegmentReader in
addition to TestDocument which I had already implemented in the last
patch. I will implement a few more as well including one with a ton of
data.

> What happens if there are two fields with the same name but one binary 
> and one not? Will there be an error? If so, when will it be given?

The code in the patch doesn't address this very well at all, stuffing
nulls into arrays which requires the api user to do some extra checking
which . If binary and non binary fields with the same name can co-exist
in a document, I could either:

1) Do away with the call to getFields(fieldName) in getValues and
getBinaryValue/s entirely and walk the fields member in these methods
checking for name/binary-ness.

2) Change Field[] getFields(String fieldName) so it only returns
non-binary fields and implement Field[] getBinaryFields(String
fieldName) that only returns binary fields

Does anyone have a preference? I'm on the fence because I'm not sure if
it's ok to change getFields to not return all of the fields of a given
name.

> Is it possible to set confliciting flags on a field if one does not use 
> the new Field.Binary method? 

I don't think so. The only way to set the isBinary flag in Field is by 
using Field.Binary, or new Field(String name, byte[] value); Each of the
flag member variables are private and there are no public set methods
for any of them.

> I'm +1 for including this (provided it works and all :), I have not 
> actually tested it myself).

Excellent! I'll work on some more extensive unit tests to really
exercise this addition and hopefully that plus your suggestions will
whip it into shape.

> Again, Drew, great work! Thanks for your contribution to Lucene.
> Sorry to have missed your original message.

No problem! Thanks and for taking the time to review it and offering the
great suggestions.

Drew


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message