manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Job definition metadata with multiple path attribute names
Date Fri, 05 Jun 2015 13:33:37 GMT
Hi Vigi,

I do understand your issue, but I'd propose a general solution of adding
new functionality to the Metadata Transformer to achieve your goal.  So the
setup would be this:

- Use the JCIFS connector Metadata tab to just include the entire path in
the metadata
- Use the Metadata Transformer to generate two different pieces of
metadata, using a new regular expression modification feature that I would
write for you, if we can come up with a design for it

You can write your own completely new transformation connector, but that's
no different than what I propose, and not as useful.

Thanks,
Karl



On Fri, Jun 5, 2015 at 9:17 AM, Virgiliu R <gosuvigi@hotmail.com> wrote:

> Dear Karl,
>
> Maybe I misunderstood the applications for the metadata tab but in my
> scenario I need to extract two types of information from a document's path.
> Right now I am only able to extract one piece of information and put it in
> Solr; it would have been very useful to be able to perform other
> transformations to the paths but it's OK, I can probably write a
> transformation connector of my own.
>
> Thanks,
> vigi
> ------------------------------
> Date: Fri, 5 Jun 2015 09:02:59 -0400
> Subject: Re: Job definition metadata with multiple path attribute names
> From: daddywri@gmail.com
> To: user@manifoldcf.apache.org
>
>
> Hi Vigi,
>
> You get, for free, the file name of the document as metadata, from all
> repository connectors, including the jcifs connector:
>
> >>>>>>
>                   rd.setFileName(fileNameString);
> <<<<<<
>
> The problem is that this is not something you can manipulate in MCF via
> regular expression with the current bevy of supplied transformation
> connectors, because (a) it isn't generic metadata but a fixed property of
> the document, and (b) the Metadata Transformer connector doesn't allow you
> to slice and dice metadata in any case, just compose it into bigger strings.
>
> So you're stuck with either writing a document transformation connector of
> your own, which does what you want, or proposing additional functionality
> for the Metadata Transformer.  If it can be done in a backwards compatible
> way, this is something I would support.
>
> I'm not thrilled with the idea of extending the JCIFS connector to build
> multiple independent attributes all from the path; the UI for this
> connector is already quite complex, and the functionality for generically
> manipulating metadata would be useful in general anyway.
>
> Karl
>
>
> On Fri, Jun 5, 2015 at 8:37 AM, Virgiliu R <gosuvigi@hotmail.com> wrote:
>
> Hello guys,
>
> I have another Manifoldcf 2.0.2 question. Our process consists of indexing
> some documents from a Windows Share and sending them to Solr. I would like
> to extract some information from the documents and put it into specific
> Solr fields. For example, based on the id of the document I am currently
> extracting a specific folder name (using regular expressions on the
> metadata tab of the job defintition) and storing it into Solr; this it
> works fine.
>
> However, I also want to extract the file extension (using regex) and send
> it to Solr but I am not able to add more than one path attribute name on
> the Metadata tab of the job definition. I already have one that extracts a
> particular folder name from the file path and I would need a second one for
> the file extension.
>
> How would I be able to achieve this?
>
> Regards,
> vigi
>
>
>

Mime
View raw message