Return-Path: X-Original-To: apmail-manifoldcf-user-archive@www.apache.org Delivered-To: apmail-manifoldcf-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C6EFB10D58 for ; Fri, 5 Jun 2015 13:03:02 +0000 (UTC) Received: (qmail 25334 invoked by uid 500); 5 Jun 2015 13:03:02 -0000 Delivered-To: apmail-manifoldcf-user-archive@manifoldcf.apache.org Received: (qmail 25278 invoked by uid 500); 5 Jun 2015 13:03:02 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 25268 invoked by uid 99); 5 Jun 2015 13:03:02 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Jun 2015 13:03:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 39BC21A4633 for ; Fri, 5 Jun 2015 13:03:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id zz38DV8lFR0O for ; Fri, 5 Jun 2015 13:03:00 +0000 (UTC) Received: from mail-ig0-f170.google.com (mail-ig0-f170.google.com [209.85.213.170]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 243252054B for ; Fri, 5 Jun 2015 13:03:00 +0000 (UTC) Received: by igbpi8 with SMTP id pi8so14956883igb.1 for ; Fri, 05 Jun 2015 06:02:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=NEvt51ZXIJ23ftsBpogxCYhGJqTF2GXYuEqdZyJNaaI=; b=qc1rPx8r5p2eFZmYE/ZfZJ4TgcXGHqbGEKc0BEE2ADggzsjY2zudZg4eUrSyWCELds O7ziloBBrGazO61tiGvsoATD5bMvu00MSSWyKufZHiDGJrsOJfvlKKxQuiQQ9IrYXvh5 tm3/6EXtKl5x2/Mi+G8ksoWJwXJFMGTBXNfNjv+XGOOuWXVYSnGyI/iOM3R1Pmf8Ioxg 9C9nSmSn6GhnqekMs/0COtaNBO0QhjVi692SPX0jt8DVXGyPEugxvUhck2E0aVhdK/6c wM1IyeysIZcxCXesj36X2SHKybTv1B1ySeZb15rJ0GAtfbVlGGlwKHnV0MrojpSqyYIX r1aw== MIME-Version: 1.0 X-Received: by 10.50.143.104 with SMTP id sd8mr28347346igb.14.1433509379163; Fri, 05 Jun 2015 06:02:59 -0700 (PDT) Received: by 10.107.165.1 with HTTP; Fri, 5 Jun 2015 06:02:59 -0700 (PDT) In-Reply-To: References: Date: Fri, 5 Jun 2015 09:02:59 -0400 Message-ID: Subject: Re: Job definition metadata with multiple path attribute names From: Karl Wright To: "user@manifoldcf.apache.org" Content-Type: multipart/alternative; boundary=001a1135f1a653cf970517c4e929 --001a1135f1a653cf970517c4e929 Content-Type: text/plain; charset=UTF-8 Hi Vigi, You get, for free, the file name of the document as metadata, from all repository connectors, including the jcifs connector: >>>>>> rd.setFileName(fileNameString); <<<<<< The problem is that this is not something you can manipulate in MCF via regular expression with the current bevy of supplied transformation connectors, because (a) it isn't generic metadata but a fixed property of the document, and (b) the Metadata Transformer connector doesn't allow you to slice and dice metadata in any case, just compose it into bigger strings. So you're stuck with either writing a document transformation connector of your own, which does what you want, or proposing additional functionality for the Metadata Transformer. If it can be done in a backwards compatible way, this is something I would support. I'm not thrilled with the idea of extending the JCIFS connector to build multiple independent attributes all from the path; the UI for this connector is already quite complex, and the functionality for generically manipulating metadata would be useful in general anyway. Karl On Fri, Jun 5, 2015 at 8:37 AM, Virgiliu R wrote: > Hello guys, > > I have another Manifoldcf 2.0.2 question. Our process consists of indexing > some documents from a Windows Share and sending them to Solr. I would like > to extract some information from the documents and put it into specific > Solr fields. For example, based on the id of the document I am currently > extracting a specific folder name (using regular expressions on the > metadata tab of the job defintition) and storing it into Solr; this it > works fine. > > However, I also want to extract the file extension (using regex) and send > it to Solr but I am not able to add more than one path attribute name on > the Metadata tab of the job definition. I already have one that extracts a > particular folder name from the file path and I would need a second one for > the file extension. > > How would I be able to achieve this? > > Regards, > vigi > --001a1135f1a653cf970517c4e929 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Vigi,

You get, fo= r free, the file name of the document as metadata, from all repository conn= ectors, including the jcifs connector:

>>>>>>
= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 rd.setFileName(fileNameString);
<<<= <<<

The problem is that this is not something you can= manipulate in MCF via regular expression with the current bevy of supplied= transformation connectors, because (a) it isn't generic metadata but a= fixed property of the document, and (b) the Metadata Transformer connector= doesn't allow you to slice and dice metadata in any case, just compose= it into bigger strings.

So you're stuck with either writi= ng a document transformation connector of your own, which does what you wan= t, or proposing additional functionality for the Metadata Transformer.=C2= =A0 If it can be done in a backwards compatible way, this is something I wo= uld support.

I'm not thrilled with the idea of extending t= he JCIFS connector to build multiple independent attributes all from the pa= th; the UI for this connector is already quite complex, and the functionali= ty for generically manipulating metadata would be useful in general anyway.=

Karl


On Fri, Jun 5, 2015 at 8:37 AM, Virgiliu R <gosuvigi@h= otmail.com> wrote:
Hello guys,

I have another Manifoldcf 2.0.2 qu= estion. Our process consists of indexing some documents from a Windows Shar= e and sending them to Solr. I would like to extract some information from t= he documents and put it into specific Solr fields. For example, based on th= e id of the document I am currently extracting a specific folder name (usin= g regular expressions on the metadata tab of the job defintition) and stori= ng it into Solr; this it works fine.

However, I also want to extrac= t the file extension (using regex) and send it to Solr but I am not able to= add more than one path attribute name on the Metadata tab of the job defin= ition. I already have one that extracts a particular folder name from the f= ile path and I would need a second one for the file extension.

How w= ould I be able to achieve this?

Regards,
vigi

--001a1135f1a653cf970517c4e929--