manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: Job definition metadata with multiple path attribute names
Date Fri, 05 Jun 2015 13:33:37 GMT
Hi Vigi,

I do understand your issue, but I'd propose a general solution of adding
new functionality to the Metadata Transformer to achieve your goal.  So the
setup would be this:

- Use the JCIFS connector Metadata tab to just include the entire path in
the metadata
- Use the Metadata Transformer to generate two different pieces of
metadata, using a new regular expression modification feature that I would
write for you, if we can come up with a design for it

You can write your own completely new transformation connector, but that's
no different than what I propose, and not as useful.


On Fri, Jun 5, 2015 at 9:17 AM, Virgiliu R <> wrote:

> Dear Karl,
> Maybe I misunderstood the applications for the metadata tab but in my
> scenario I need to extract two types of information from a document's path.
> Right now I am only able to extract one piece of information and put it in
> Solr; it would have been very useful to be able to perform other
> transformations to the paths but it's OK, I can probably write a
> transformation connector of my own.
> Thanks,
> vigi
> ------------------------------
> Date: Fri, 5 Jun 2015 09:02:59 -0400
> Subject: Re: Job definition metadata with multiple path attribute names
> From:
> To:
> Hi Vigi,
> You get, for free, the file name of the document as metadata, from all
> repository connectors, including the jcifs connector:
> >>>>>>
>                   rd.setFileName(fileNameString);
> <<<<<<
> The problem is that this is not something you can manipulate in MCF via
> regular expression with the current bevy of supplied transformation
> connectors, because (a) it isn't generic metadata but a fixed property of
> the document, and (b) the Metadata Transformer connector doesn't allow you
> to slice and dice metadata in any case, just compose it into bigger strings.
> So you're stuck with either writing a document transformation connector of
> your own, which does what you want, or proposing additional functionality
> for the Metadata Transformer.  If it can be done in a backwards compatible
> way, this is something I would support.
> I'm not thrilled with the idea of extending the JCIFS connector to build
> multiple independent attributes all from the path; the UI for this
> connector is already quite complex, and the functionality for generically
> manipulating metadata would be useful in general anyway.
> Karl
> On Fri, Jun 5, 2015 at 8:37 AM, Virgiliu R <> wrote:
> Hello guys,
> I have another Manifoldcf 2.0.2 question. Our process consists of indexing
> some documents from a Windows Share and sending them to Solr. I would like
> to extract some information from the documents and put it into specific
> Solr fields. For example, based on the id of the document I am currently
> extracting a specific folder name (using regular expressions on the
> metadata tab of the job defintition) and storing it into Solr; this it
> works fine.
> However, I also want to extract the file extension (using regex) and send
> it to Solr but I am not able to add more than one path attribute name on
> the Metadata tab of the job definition. I already have one that extracts a
> particular folder name from the file path and I would need a second one for
> the file extension.
> How would I be able to achieve this?
> Regards,
> vigi

View raw message