manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Job definition metadata with multiple path attribute names
Date Fri, 05 Jun 2015 13:02:59 GMT
Hi Vigi,

You get, for free, the file name of the document as metadata, from all
repository connectors, including the jcifs connector:

>>>>>>
                  rd.setFileName(fileNameString);
<<<<<<

The problem is that this is not something you can manipulate in MCF via
regular expression with the current bevy of supplied transformation
connectors, because (a) it isn't generic metadata but a fixed property of
the document, and (b) the Metadata Transformer connector doesn't allow you
to slice and dice metadata in any case, just compose it into bigger strings.

So you're stuck with either writing a document transformation connector of
your own, which does what you want, or proposing additional functionality
for the Metadata Transformer.  If it can be done in a backwards compatible
way, this is something I would support.

I'm not thrilled with the idea of extending the JCIFS connector to build
multiple independent attributes all from the path; the UI for this
connector is already quite complex, and the functionality for generically
manipulating metadata would be useful in general anyway.

Karl


On Fri, Jun 5, 2015 at 8:37 AM, Virgiliu R <gosuvigi@hotmail.com> wrote:

> Hello guys,
>
> I have another Manifoldcf 2.0.2 question. Our process consists of indexing
> some documents from a Windows Share and sending them to Solr. I would like
> to extract some information from the documents and put it into specific
> Solr fields. For example, based on the id of the document I am currently
> extracting a specific folder name (using regular expressions on the
> metadata tab of the job defintition) and storing it into Solr; this it
> works fine.
>
> However, I also want to extract the file extension (using regex) and send
> it to Solr but I am not able to add more than one path attribute name on
> the Metadata tab of the job definition. I already have one that extracts a
> particular folder name from the file path and I would need a second one for
> the file extension.
>
> How would I be able to achieve this?
>
> Regards,
> vigi
>

Mime
View raw message