lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Custom FieldTypes
Date Wed, 22 Mar 2017 20:38:33 GMT
You could provide the URP chain name (or individual URPs) when you
index a particular document type, but that requires you to send all
document types to put signature on together.

Or you could have a custom URP that skips other ones (they are
chained), though that's messier.

And I think you want overwriteDupes as "false" actually, otherwise URP
will delete the previous matching document.

Regards,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 22 March 2017 at 15:46, Ronald Wood <rwood@smarsh.com> wrote:
> Thanks. I had seen that page but had passed it over since I don’t want to do de-duping
(text fields with the exact same text are possible and not cause for de-dupe).
>
> If I want just to store the signature, it looks like I define the signatureField in the
configuration and set overwriteDupes to true (since I don’t actually regard them as dupes).
>
> I guess the one downside to this is that the processor will run regardless of the document
type (we have 6 types and only 3 need hashes on text). Or maybe empty values for fields stops
the processor? No signature is needed when the text fields are not provided.
>
> -R
>
> On 3/22/17, 3:20 PM, "Alexandre Rafalovitch" <arafalov@gmail.com> wrote:
>
>     You'd use CloneField URP
>     http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html
>
>     Then you do your custom algorithm. Or - as I just remembered - use one
>     of the hash ones described in dedupe section:
>     https://cwiki.apache.org/confluence/display/solr/De-Duplication (which
>     don't see to require CloneField anyway).
>
>     Regards,
>        Alex.
>     ----
>     http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
>     On 22 March 2017 at 14:55, Ronald Wood <rwood@smarsh.com> wrote:
>     > I suppose it could be, but the flexibility of using copy directives is appealing
for handling multiple fields as defined in the schema.
>     >
>     > Since I have rarely looked at the UpdateRequestProcessor, I guess I don’t
know if it could take multiple fields to hash, and if so how that would be expressed.
>     >
>     > -R
>     >
>     > On 3/22/17, 2:21 PM, "Alexandre Rafalovitch" <arafalov@gmail.com> wrote:
>     >
>     >     Can this be done at the UpdateRequestProcessor stage?
>     >
>     >     Regards,
>     >         Alex
>     >
>     >
>     >     On 22 Mar 2017 1:48 PM, "Ronald Wood" <rwood@smarsh.com> wrote:
>     >
>     >     I have been mulling over the usefulness of a new Hash field type for being
>     >     able to validate data that is indexed but not stored. Basically, I’d use
>     >     copy directives to copy all fields to be hashed to the new hash field and
>     >     store a SHA-256 hash as a string. I’m still not sure how valuable it would
>     >     for us. Maybe someone has already done something similar?
>     >
>     >     However, I was wondering in general about how one would go about
>     >     implementing and integrating a few FieldType.
>     >
>     >     Looking at UUIDField<https://github.com/apache/lucene-solr/blob/
>     >     master/solr/core/src/java/org/apache/solr/schema/UUIDField.java> as an
>     >     example, the work seems moderate. But then the question is, how would I
>     >     integrate it? Just drop in a new jar with the class or does it have to be
>     >     integrated into Solr as a proper commit?
>     >
>     >     If it were valuable for others, I would love to contribute it, should we
go
>     >     ahead with it. But I already have had trouble getting our Legal Dept. to
>     >     give the go ahead to contribute the code that worked for re-indexing
>     >     docValues in place (SOLR-9437). ☹
>     >
>     >     -Ronald S. Wood
>     >
>     >
>
>

Mime
View raw message