oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <chris.mattm...@gmail.com>
Subject Re: Crawling / Archiving binary data with Solr backend
Date Mon, 23 Nov 2015 15:33:56 GMT
yeah check the metadata. Any weird UTF-8 encoding?

(aka run tika on the file outside of OODT what do you see?)

—
Chris Mattmann
chris.mattmann@gmail.com






-----Original Message-----
From: Tom Barber <tom.barber@meteorite.bi>
Reply-To: <dev@oodt.apache.org>
Date: Monday, November 23, 2015 at 7:23 AM
To: "dev@oodt.apache.org" <dev@oodt.apache.org>
Subject: Re: Crawling / Archiving binary data with Solr backend

>./crawler/bin/crawler_launcher     --filemgrUrl http://localhost:9000
>--operation --launchMetCrawler     --clientTransferer
>org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>--productPath $OODT_HOME/data/staging     --metExtractor
>org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor
>--metExtractorConfig /home/bugg/Projects/surrey100/oodt/data/met/tika.conf
>
>I'm running that. Which runs fine with the default lucene stuff, also runs
>fine with a txt file, but doesn't run fine over a random picture I took or
>over an mp3 I tested it on.
>
>
>On Mon, Nov 23, 2015 at 3:12 PM, Mattmann, Chris A (3980) <
>chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Encoding issues with the extracted metadata? What are you getting
>> just running Tika on the files?
>>
>> The actual data shouldn’t matter since it’s not being ingested
>> (are you doing it in place, or what data transferer are you using)?
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Tom Barber <tom.barber@meteorite.bi>
>> Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org>
>> Date: Monday, November 23, 2015 at 6:36 AM
>> To: "dev@oodt.apache.org" <dev@oodt.apache.org>
>> Subject: Crawling / Archiving binary data with Solr backend
>>
>> >Hello,
>> >
>> >Looks like I've never tried it before with binary data. If I swap the
>> >filemgr defaults to use solr then try and crawl my staging directory
>>using
>> >the Tika extractor I get a lot of
>> >
>> >org.apache.xmlrpc.XmlRpcException: java.lang.Exception:
>> >org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error
>> >ingesting product
>>[org.apache.oodt.cas.filemgr.structs.Product@62b19476]
>> :
>> >null
>> >at
>> 
>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpcCl
>>>ie
>> >ntResponseProcessor.java:104)
>> >at
>> 
>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcCli
>>>en
>> >tResponseProcessor.java:71)
>> >at
>> 
>>>org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:73)
>> >
>> >
>> >Type things.
>> >
>> >Any ideas?
>> >
>> >Tom
>>
>>



Mime
View raw message