oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Barber <tom.bar...@meteorite.bi>
Subject Re: Crawling / Archiving binary data with Solr backend
Date Mon, 23 Nov 2015 15:23:47 GMT
./crawler/bin/crawler_launcher     --filemgrUrl http://localhost:9000
--operation --launchMetCrawler     --clientTransferer
org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
--productPath $OODT_HOME/data/staging     --metExtractor
org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor
--metExtractorConfig /home/bugg/Projects/surrey100/oodt/data/met/tika.conf

I'm running that. Which runs fine with the default lucene stuff, also runs
fine with a txt file, but doesn't run fine over a random picture I took or
over an mp3 I tested it on.


On Mon, Nov 23, 2015 at 3:12 PM, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Encoding issues with the extracted metadata? What are you getting
> just running Tika on the files?
>
> The actual data shouldn’t matter since it’s not being ingested
> (are you doing it in place, or what data transferer are you using)?
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> -----Original Message-----
> From: Tom Barber <tom.barber@meteorite.bi>
> Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org>
> Date: Monday, November 23, 2015 at 6:36 AM
> To: "dev@oodt.apache.org" <dev@oodt.apache.org>
> Subject: Crawling / Archiving binary data with Solr backend
>
> >Hello,
> >
> >Looks like I've never tried it before with binary data. If I swap the
> >filemgr defaults to use solr then try and crawl my staging directory using
> >the Tika extractor I get a lot of
> >
> >org.apache.xmlrpc.XmlRpcException: java.lang.Exception:
> >org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error
> >ingesting product [org.apache.oodt.cas.filemgr.structs.Product@62b19476]
> :
> >null
> >at
> >org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpcClie
> >ntResponseProcessor.java:104)
> >at
> >org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcClien
> >tResponseProcessor.java:71)
> >at
> >org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:73)
> >
> >
> >Type things.
> >
> >Any ideas?
> >
> >Tom
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message