incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant McLean <gr...@catalyst.net.nz>
Subject Re: [lucy-user] Index state during merges
Date Wed, 02 Nov 2011 19:08:42 GMT
On Wed, 2011-11-02 at 11:29 -0700, Marvin Humphrey wrote:
> Maybe we should consider scanning incoming fields for UTF-8 sanity after all.
> I don't like making everybody pay this penalty -- small though it is --
> because you'll only get bad UTF-8 if your indexing setup is broken somehow.
> On the other hand, I don't like that once a single bad UTF-8 sequence makes it
> through a commit, the index is irretrievably corrupt -- and you only discover
> that after the damage is done.

Perhaps the sanity checking could be controlled by an option that
defaults to 'on'.  Then people who *know* their setup is UTF-8 clean can
call something like $indexer->no_validate_utf8() to avoid the
performance penalty.

Cheers
Grant


Mime
View raw message