lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From PeteBleackley <bleackl...@zooey.co.uk>
Subject Re: Problems using DataImportHandler and TikaEntityProcessor
Date Fri, 11 Oct 2013 15:32:36 GMT
kamaci wrote
> There may be a problem with you schema. Could you send your solr logs?
> 
> 
> 2013/10/11 Peter Bleackley &lt;

> bleackleyp@.co

> &gt;
> 
>> Starting Solr with the command line
>>
>>
>> java -Dsolr.solr.home=example-DIH/**solr -jar start.jar
>>
>>
>> and then trying to import some data with
>>
>> java
>> -Durl=http://localhost:8983/**solr/tika/update&lt;http://localhost:8983/solr/tika/update&gt;-Dtype=application/pdf
>> -jar post.jar *.pdf
>>
>> fails with error
>>
>> SimplePostTool: WARNING: Solr returned an error #400 Bad Request
>> SimplePostTool: WARNING: IOException while reading response:
>> java.io.IOException: Server returned HTTP response code: 400 for URL:
>> http://localhost:8983/solr/**tika/update&lt;http://localhost:8983/solr/tika/update&gt;
>>
>> These are all valid PDFs that I have previously been able to import with
>> Solr Cell.
>>
>> What am I doing wrong?
>>
>> Dr Peter J Bleackley
>> Computational Linguistics Contractor
>> Playful Technology Ltd
>>
>>
>>

11228 [qtp1831924725-17] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [tika] webapp=/solr
path=/update params={} {} 0 0
11229 [qtp1831924725-17] ERROR org.apache.solr.core.SolrCore  –
org.apache.solr.common.SolrException: Unsupported ContentType:
application/pdf  Not in: [application/xml, text/csv, text/json,
application/csv, application/javabin, text/xml, application/json]
	at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86)
	at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
	at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
	at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
	at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
	at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
	at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
	at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
	at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
	at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
	at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
	at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
	at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
	at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
	at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
	at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
	at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
	at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
	at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
	at org.eclipse.jetty.server.Server.handle(Server.java:368)
	at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
	at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
	at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
	at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
	at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
	at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
	at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
	at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:724)


I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a 404
error, apparently caused by post.jar adding /extract to the end of the URL





--
View this message in context: http://lucene.472066.n3.nabble.com/Problems-using-DataImportHandler-and-TikaEntityProcessor-tp4094983p4094987.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message