Return-Path: X-Original-To: apmail-manifoldcf-dev-archive@www.apache.org Delivered-To: apmail-manifoldcf-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D78710234 for ; Wed, 11 Jun 2014 16:22:18 +0000 (UTC) Received: (qmail 47316 invoked by uid 500); 11 Jun 2014 16:22:18 -0000 Delivered-To: apmail-manifoldcf-dev-archive@manifoldcf.apache.org Received: (qmail 47265 invoked by uid 500); 11 Jun 2014 16:22:18 -0000 Mailing-List: contact dev-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@manifoldcf.apache.org Delivered-To: mailing list dev@manifoldcf.apache.org Received: (qmail 47254 invoked by uid 99); 11 Jun 2014 16:22:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jun 2014 16:22:18 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of daddywri@gmail.com designates 209.85.160.180 as permitted sender) Received: from [209.85.160.180] (HELO mail-yk0-f180.google.com) (209.85.160.180) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jun 2014 16:22:12 +0000 Received: by mail-yk0-f180.google.com with SMTP id 131so3497142ykp.11 for ; Wed, 11 Jun 2014 09:21:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=GQoNK0ilJXOV77RVkvP0JkgLUhncAVWDOeptrDSKGBk=; b=UyroXYOkEzWFo5+7V00hk2iqhm3pljV8szqVAs0k4RRNnxyW8Yid9D0UE0CD3Nbz8j wTcrIQjXtNZqH+XUJJC0d4W3hZOK6qDAl21Fa2Pqz5Br9sjccyJOCwxbiDe4v0lRmpfh IdXPUrtA7CY8lDlnxLuRvXBTLMQHIvzYA+h055WXU1gAt3H3SVPY5gtXeAvw+jtFGo6X cFXl6upLXTXZPhW/Sl0sJdttdJFH+KuGqniEZLDNy2J5jZdDZP2T54fRDxJC/vM6yQAT nXKG/aXEX/XR9DqUNgT+vtK5OTQCKcv9rn2MUIMq4UALfnwHkYbS+dTIw7epkU6UHx4w b/Tw== MIME-Version: 1.0 X-Received: by 10.236.125.74 with SMTP id y50mr6933662yhh.98.1402503712120; Wed, 11 Jun 2014 09:21:52 -0700 (PDT) Received: by 10.170.197.73 with HTTP; Wed, 11 Jun 2014 09:21:52 -0700 (PDT) In-Reply-To: <53987F7D.6030109@apache.org> References: <53987F7D.6030109@apache.org> Date: Wed, 11 Jun 2014 12:21:52 -0400 Message-ID: Subject: Re: Call for trunk pipeline testers From: Karl Wright To: dev Cc: takumi yoshida , Rafa Haro Content-Type: multipart/alternative; boundary=20cf303b3e1d8ea63004fb91d756 X-Virus-Checked: Checked by ClamAV on apache.org --20cf303b3e1d8ea63004fb91d756 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Rafa, We would be very interested in a contribution that addresses CONNECTORS-954. As far as changing the Solr connector to not use the extracting update handler, as long as that is only one of many options that contribution too would be welcome. Please consider opening a ticket specifically for that change. Output to multiple indexes at the same time has come up before, but this is more of a challenge because in theory we'd want to keep a different record in the ingeststatus table for each document for each individual output index. With pipeline support, each output index would also no doubt need a distinct pipeline as well. Nevertheless, I'm not opposed to adding this feature if I can work out a good way to do it. So let's start with CONNECTORS-954 and Solr connector changes, and see how far we get. Karl On Wed, Jun 11, 2014 at 12:10 PM, Rafa Haro wrote: > Hi Karl, > > We (in Zaizi) had also this requirement. We initially addressed it by > creating a sort of "Processor Connector" mainly for semantically enhancin= g > the repository documents before indexing them. We would be very happy to > give this a try and provide feedback because our current approach is > totally temporal. Apart from processing the document, we also had an > special requirement that is to produce different instances of repository > documents because we populate more than one index at the same time. We > would need to check also how we can do exactly the same with this > processing pipeline. > > Apart from this Karl, we can also take care of the Tika integration > (actually we already did it) and eventually take care of CONNECTORS-954 > then. Because we already use Tika as "processor connector", we are going = to > also modify the solr connector for not using the extract update handler > which present some problems also. Would that be interesting also for the > community? > > Cheers, > Rafa > > El 11/06/14 16:09, Karl Wright escribi=C3=B3: > > Hi folks, >> >> ManifoldCF finally has a pipeline! All tests pass. Now I'm looking for >> people to try things out by hand to see if there are any rough edges, >> before we get too far along in the 1.7 development cycle to fix them. >> >> Trunk has all the necessary moving parts and documentation as well. The= re >> are two transformation connectors available -- one that does nothing but >> pass data through, and one that forces metadata (just like the framework >> "Forced metadata" tab). But since you can have more than one of each ki= nd >> of connector in a pipeline, this should be enough to exercise things >> fairly >> completely. >> >> We still need to address a couple of things in the medium and long term. >> First, we need a Tika transformation connector, that extracts metadata >> from >> binary files. There's an existing ticket for that: CONNECTORS-954. If >> anyone wants to take a crack at that, please let me know. (Takumi Yoshi= da >> would be the obvious choice.) Second, we need to come up with a strateg= y >> of removing obsolete tabs/features, like the aforementioned general job >> Forced Metadata tab. We've got a fair number of such features around no= w. >> These kinds of things cannot be removed without either a comprehensive >> automatic upgrade, or loss of backwards compatibility. I am thinking >> maybe >> we break with backwards compatibility and work towards cleaning out >> duplicate features for ManifoldCF 2.0. >> >> Thoughts? >> >> Karl >> >> > --20cf303b3e1d8ea63004fb91d756--