manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chalitha Perera <cper...@zaizi.com>
Subject Re: ManifoldCF + Solr. No content field showing up in Solr
Date Thu, 17 Dec 2015 05:02:22 GMT
Hi Stephen,

As Karl pointed out update/extract endpoint works with Solr Cell and uses
tika to extract metadata. So in MCF connector chain if you are also using
Tika Transformation connector update/extract may not work properly as
metadata is already extracted. Therefore if you are using tika connector,
consider using normal /update endpoint in Solr. I am using following
configurations (I have attached the screen shots) and those worked fine for
me.

Thanks,
Chalitha

On Wed, Dec 16, 2015 at 9:32 PM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Stephen,
>
> That endpoint is meant to work with Solr Cell, which includes Tika.  My
> guess is that you don't have Solr Cell configured or properly installed,
> which is why the content field isn't getting populated.  The Solr logs
> should give you some feedback if that's the case.
>
> Karl
>
>
> On Wed, Dec 16, 2015 at 10:59 AM, Corey, Stephen <COREYS@ecu.edu> wrote:
>
>> The default for that connector, which is '/update/extract'.
>>
>>
>> Stephen Corey
>> Technology Consultant
>> East Carolina University
>> 252-737-2541
>> coreys@ecu.edu
>>
>> ________________________________________
>> From: Karl Wright [daddywri@gmail.com]
>> Sent: Wednesday, December 16, 2015 8:40 AM
>> To: user@manifoldcf.apache.org
>> Subject: Re: ManifoldCF + Solr. No content field showing up in Solr
>>
>> Hi Stephen,
>>
>> Which update handler do you have your solr output connector configured to
>> use?
>>
>> Karl
>>
>>
>> On Wed, Dec 16, 2015 at 8:37 AM, Corey, Stephen <COREYS@ecu.edu<mailto:
>> COREYS@ecu.edu>> wrote:
>> I'm using MCF 2.1, and Solr 5.3.1, running in cloud mode. I'm using the
>> web connector in MCF to crawl a website, and output using the Solr
>> connector. Both applications are running on the same (RHEL) machine. The
>> crawling seems to run fine, and I get all the documents showing up in Solr,
>> except that the "content" field never gets added to Solr. I'm using the
>> schemaless mode in Solr, so it'll add any fields that MCF sends to it. I'm
>> not sure what is going wrong for me to not get the content field? I've
>> added the field manually to Solr, and it still never gets populated. I've
>> also tried adding a Tika transformation connector, and specified "extract
>> everything" with the boilerplate setting, and still no luck.
>>
>> I think I'm missing something very simple, but what is it?
>>
>> Thanks, all
>>
>
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Mime
View raw message