manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Call for trunk pipeline testers
Date Wed, 11 Jun 2014 16:21:52 GMT
Hi Rafa,

We would be very interested in a contribution that addresses
CONNECTORS-954.  As far as changing the Solr connector to not use the
extracting update handler, as long as that is only one of many options that
contribution too would be welcome.  Please consider opening a ticket
specifically for that change.

Output to multiple indexes at the same time has come up before, but this is
more of a challenge because in theory we'd want to keep a different record
in the ingeststatus table for each document for each individual output
index.  With pipeline support, each output index would also no doubt need a
distinct pipeline as well.  Nevertheless, I'm not opposed to adding this
feature if I can work out a good way to do it.

So let's start with CONNECTORS-954 and Solr connector changes, and see how
far we get.

Karl



On Wed, Jun 11, 2014 at 12:10 PM, Rafa Haro <rharo@apache.org> wrote:

> Hi Karl,
>
> We (in Zaizi) had also this requirement. We initially addressed it by
> creating a sort of "Processor Connector" mainly for semantically enhancing
> the repository documents before indexing them. We would be very happy to
> give this a try and provide feedback because our current approach is
> totally temporal. Apart from processing the document, we also had an
> special requirement that is to produce different instances of repository
> documents because we populate more than one index at the same time. We
> would need to check also how we can do exactly the same with this
> processing pipeline.
>
> Apart from this Karl, we can also take care of the Tika integration
> (actually we already did it) and eventually take care of CONNECTORS-954
> then. Because we already use Tika as "processor connector", we are going to
> also modify the solr connector for not using the extract update handler
> which present some problems also. Would that be interesting also for the
> community?
>
> Cheers,
> Rafa
>
> El 11/06/14 16:09, Karl Wright escribió:
>
>  Hi folks,
>>
>> ManifoldCF finally has a pipeline!  All tests pass.  Now I'm looking for
>> people to try things out by hand to see if there are any rough edges,
>> before we get too far along in the 1.7 development cycle to fix them.
>>
>> Trunk has all the necessary moving parts and documentation as well.  There
>> are two transformation connectors available -- one that does nothing but
>> pass data through, and one that forces metadata (just like the framework
>> "Forced metadata" tab).  But since you can have more than one of each kind
>> of connector in a pipeline, this should be enough to exercise things
>> fairly
>> completely.
>>
>> We still need to address a couple of things in the medium and long term.
>> First, we need a Tika transformation connector, that extracts metadata
>> from
>> binary files.  There's an existing ticket for that: CONNECTORS-954.  If
>> anyone wants to take a crack at that, please let me know.  (Takumi Yoshida
>> would be the obvious choice.)  Second, we need to come up with a strategy
>> of removing obsolete tabs/features, like the aforementioned general job
>> Forced Metadata tab.  We've got a fair number of such features around now.
>> These kinds of things cannot be removed without either a comprehensive
>> automatic upgrade, or loss of backwards compatibility.  I am thinking
>> maybe
>> we break with backwards compatibility and work towards cleaning out
>> duplicate features for ManifoldCF 2.0.
>>
>> Thoughts?
>>
>> Karl
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message