manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <>
Subject Re: Call for trunk pipeline testers
Date Wed, 11 Jun 2014 22:42:23 GMT

bq. we are going to also modify the solr connector for not using the extract update handler 

+1 to this. With this, we can support wide range of solr versions by just sending xml update
messages. Solr setups will be simpler i.e. Don't need to have solr cell jars. We can drop
our custom solr server implementation.


On Wednesday, June 11, 2014 7:10 PM, Rafa Haro <> wrote:
Hi Karl,

We (in Zaizi) had also this requirement. We initially addressed it by 
creating a sort of "Processor Connector" mainly for semantically 
enhancing the repository documents before indexing them. We would be 
very happy to give this a try and provide feedback because our current 
approach is totally temporal. Apart from processing the document, we 
also had an special requirement that is to produce different instances 
of repository documents because we populate more than one index at the 
same time. We would need to check also how we can do exactly the same 
with this processing pipeline.

Apart from this Karl, we can also take care of the Tika integration 
(actually we already did it) and eventually take care of CONNECTORS-954 
then. Because we already use Tika as "processor connector", we are going 
to also modify the solr connector for not using the extract update 
handler which present some problems also. Would that be interesting also 
for the community?


El 11/06/14 16:09, Karl Wright escribió:
> Hi folks,
> ManifoldCF finally has a pipeline!  All tests pass.  Now I'm looking for
> people to try things out by hand to see if there are any rough edges,
> before we get too far along in the 1.7 development cycle to fix them.
> Trunk has all the necessary moving parts and documentation as well.  There
> are two transformation connectors available -- one that does nothing but
> pass data through, and one that forces metadata (just like the framework
> "Forced metadata" tab).  But since you can have more than one of each kind
> of connector in a pipeline, this should be enough to exercise things fairly
> completely.
> We still need to address a couple of things in the medium and long term.
> First, we need a Tika transformation connector, that extracts metadata from
> binary files.  There's an existing ticket for that: CONNECTORS-954.  If
> anyone wants to take a crack at that, please let me know.  (Takumi Yoshida
> would be the obvious choice.)  Second, we need to come up with a strategy
> of removing obsolete tabs/features, like the aforementioned general job
> Forced Metadata tab.  We've got a fair number of such features around now.
> These kinds of things cannot be removed without either a comprehensive
> automatic upgrade, or loss of backwards compatibility.  I am thinking maybe
> we break with backwards compatibility and work towards cleaning out
> duplicate features for ManifoldCF 2.0.
> Thoughts?
> Karl

View raw message