lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamuela Lau <kamuela....@gmail.com>
Subject Re: Indexing PDF file in Apache SOLR via Apache TIKA
Date Tue, 30 Oct 2018 07:21:13 GMT
Hi there,

Here are a couple of ways I'm aware of:

1. Extract-handler / post tool
You can use the curl command with the extract handler or bin/post to upload
a single document.
Reference:
https://lucene.apache.org/solr/guide/7_5/uploading-data-with-solr-cell-using-apache-tika.html

2. DataImportHandler
This could be used for, say, uploading multiple documents with Tika.
Reference:
https://lucene.apache.org/solr/guide/7_5/uploading-structured-data-store-data-with-the-data-import-handler.html#the-tikaentityprocessor

You should also be able to do it via the admin page, so long as you define
and modify the extract handler in solrconfig.xml.
Reference:
https://lucene.apache.org/solr/guide/7_5/documents-screen.html#file-upload

Hope this helps!

On Tue, Oct 30, 2018 at 3:40 PM adiyaksa kevin <adiyaksakevin@gmail.com>
wrote:

> Hello there, let me introduce my self. My name is Mohammad Kevin Putra (you
> can call me Kevin), from Indonesia, i am a beginner in backend developer, i
> use Linux Mint, i use Apache SOLR 7.5.0 and Apache TIKA 1.91.0.
>
> I have a little bit problem about how to put PDF File via Apache TIKA. I
> understand how SOLR or TIKA works, but i don't know how they both
> integrated.
> Last thing i know, TIKA can extract the PDF file i upload, and parse it
> into data/meta data automatically. And i just have to copy & paste it to
> the "Documents" tab in core solr.
> The question is :
> 1. can i upload PDF File to SOLR via TIKA with GUI mode ? or is it only
> with CLI mode ? if yes only with CLI mode, can you explain it to me please
> ?
> 2. Is it possible to add a text result in "Query" tab ?.
>
> The Background i asking about this is, i want to indexing PDF in my local
> system, then i just upload it like "drag & drop" in SOLR (is it possible ?)
> then when i type something in search box the result is like this :
> (Title of doc)
> blablablabla (yellow stabilo result) blablabla.
> the blablabla text is like a couple sentences. That's all i need.
> Sorry for my bad english.
> Thanks for reading and replying this for me, it will be very helpful to me.
> Thanks a lot
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message