manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matteo Grolla <m.gro...@sourcesense.com>
Subject Re: Solr Extracting request handler
Date Mon, 16 Jun 2014 15:14:38 GMT
Thanks Alessandro,
	that explains the situation clearly.
And I agree that sending all the metadata as get parameter can be problematic

Cheers 

-- 
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com

Il giorno 16/giu/2014, alle ore 17:09, Alessandro Benedetti ha scritto:

> mmmm the point is that right now ManifoldCF has no extractors.
> The Repository connectors extracts directly the binary and there is no
> "Extractor Processor" yet.
> But recently a pipe-line processor architecture has been thought (
> https://issues.apache.org/jira/browse/CONNECTORS-959)
> So can fit there.
> 
> Cheers
> 
> 
> 2014-06-16 15:59 GMT+01:00 Matteo Grolla <m.grolla@sourcesense.com>:
> 
>> Since Solr extracting request handler takes the binary and extracts text
>> what is the point of not using Manifold extractor and send text and
>> binaries to solr?
>> I mean the end result is the same solr indexes text and stores text
>> So if manifold supports text extraction it seems me this is the place
>> where it should be done
>> 
>> --
>> Matteo Grolla
>> Sourcesense - making sense of Open Source
>> http://www.sourcesense.com
>> 
>> Il giorno 16/giu/2014, alle ore 16:51, Antonio David Perez Morales ha
>> scritto:
>> 
>>> Hi Matteo
>>> 
>>> Manifold already handles the extraction, but the only way to send binary
>>> content and document metadata to Solr is using the update/extract
>> handler,
>>> where the metadata is sent as query parameters and the binary content is
>>> sent in the body of the requests, allowing Solr to use Tika to obtain the
>>> raw content to be stored in Solr.
>>> 
>>> Regards
>>> 
>>> 
>>> On Mon, Jun 16, 2014 at 4:35 PM, Matteo Grolla <m.grolla@sourcesense.com
>>> 
>>> wrote:
>>> 
>>>> Hi During my first indexing I noticed that manifold uses Solr extracting
>>>> request handler to extract the content of an xml file
>>>> For performance reasons it would be better if Manifold handled the
>>>> extraction letting Solr do the search engine
>>>> Is this because of the connector design, framework design or just to be
>>>> done?
>>>> 
>>>> --
>>>> Matteo Grolla
>>>> Sourcesense - making sense of Open Source
>>>> http://www.sourcesense.com
>>>> 
>>>> 
>>> 
>>> --
>>> 
>>> ------------------------------
>>> This message should be regarded as confidential. If you have received
>> this
>>> email in error please notify the sender and destroy it immediately.
>>> Statements of intent shall only become binding when confirmed in hard
>> copy
>>> by an authorised signatory.
>>> 
>>> Zaizi Ltd is registered in England and Wales with the registration number
>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>> London W6 7AN.
>> 
>> 
> 
> 
> -- 
> --------------------------
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message