manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bisonti Mario <Mario.Biso...@vimar.com>
Subject R: Add field to Output Solr
Date Tue, 16 Oct 2018 15:23:39 GMT
I set in the job the connection:

  1.  Repository: WinShare
  2.  Transformation: Allowed Documents
  3.  Transformation: TikaExternal
  4.  Transformation: MetadataExtractor
  5.  Output: SolrShare

so, in
allowed contents I put the allowed mimetypes and extension

in the field mapping I added
[cid:image002.png@01D46574.F9A5A060]
and I unchecked  “keep all metadata”

in the metadata expressions I checked “Keep all incoming metadata” and “remove empy
metadata values”

Obviously, my solr schema has to contains the field last_author, author besides the fields
that I specified in the output connection SolrShare tab Schema
[cid:image006.png@01D46574.F9A5A060]


It works, in the solr index I find the field added last_author and author (where they aren’t
empty)

I hope that my approach is the right way to set the architecture ManifoldCF-Solr-Tika

Thanks a lot, Karl for your patience..

Mario




Da: Karl Wright <daddywri@gmail.com>
Inviato: martedì 16 ottobre 2018 13:11
A: user@manifoldcf.apache.org
Oggetto: Re: Add field to Output Solr

If it's not in your PDFs, Tika won't extract it.
If you merely want to copy another field, you can use the Metadata Adjuster transformer to
do that.

Karl


On Tue, Oct 16, 2018 at 4:38 AM Bisonti Mario <Mario.Bisonti@vimar.com<mailto:Mario.Bisonti@vimar.com>>
wrote:
Hallo
I am using Tika server as processor of file pdf, doc, etc

I configured:
[cid:image003.png@01D4653C.61DD4040]
In my solr output connection, so, when I index the documents I see the field:
id
last_modified
resourcename
content_type
allow_token_document
deny_token_document
allow_token_share
deny_token_share
stream_size
creator
deny_token_parent
allow_token_parent
content
_version_


In my schema of Solr, I have the field last_author that I would like to be indexed.
How can I add it?

Thanks a lot

Mario
Mime
View raw message