lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kostali hassan <med.has.kost...@gmail.com>
Subject Re: indexing rich data from directory using solarium
Date Wed, 02 Dec 2015 16:29:50 GMT
the prob with posting using line commande is :

I start working in solr 5.3.1 by extract solr in D://solr and run solr
server with :

D:\solr\solr-5.3.1\bin>solr start ;

Then I create a core in standalone mode :

D:\solr\solr-5.3.1\bin>solr create -c mycore

I need indexing from system files (word and pdf) and the schema API don’t
have a field “name” of document, then I Add this field using curl :

curl -X POST -H 'Content-type:application/json' --data-binary '{

  "add-field":{

     "name":"name",

     "type":"text_general",

     "stored":true,

     “indexed”:true }

}' http://localhost:8983/solr/mycore/schema



And re-index all document.with windows SimplepostTools:

D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar -Dauto=yes
-Dc=mycore -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool
D:\Lucene\document ;



But even if the field “name” is succeffly added he is empty ; the field
title get the name for only pdf document not for msword(.doc and .docx).



Then I choose indexing with techproducts example because he don’t use
schema.xml API then I can modified my schema:



D:\solr\solr-5.3.1>solr –e techproducts



Techproducts return the name of all files.xml indexed;



Then I create a new core based in solr_home example/techproducts/solr and I
use schema.xml (contient field “name”) and solrConfig.xml from techproducts
in this new core called demo.

When I indexed all document the field name exist but still empty for all
document indexed.



My question is how I can get just the name of each document(msword and pdf)
not the path like the field “id” or field “ressource_name” ; I have to
create new Typefield or exist another way.

2015-12-02 16:25 GMT+00:00 kostali hassan <med.has.kostali@gmail.com>:

> yes they are a Error in my solr logs:
> SolrException URLDecoder: Invalid character encoding detected after
> position 79 of query string / form data (while parsing as UTF-8)
> <http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79>
> this is my post in stack overflow :
>
> http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79
>
> 2015-12-02 16:18 GMT+00:00 Gora Mohanty <gora@mimirtech.com>:
>
>> On 2 December 2015 at 17:16, kostali hassan <med.has.kostali@gmail.com>
>> wrote:
>> > yes its logic Thank you , but i want understand why the same data is
>> > indexing fine in shell using windows SimplePostTool :
>> >>
>> >> D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar
>> -Dauto=yes
>> >> -Dc=solr_docs_core -Ddata=files -Drecursive=yes
>> >> org.apache.solr.util.SimplePostTool D:\Lucene\document ;
>>
>> That seems strange. Are you sure that you are posting the same PDF.
>> With SimplePostTool, you should be POSTing to the URL
>> /solr/update/extract?literal.id=myid , i.e., you need an option of
>> something like:
>> -Durl=http://localhost:8983/solr/update/extract?literal.id=myid in the
>> command line for SimplePostTool.
>>
>> Likewise, I am not that familiar with Solarium. Are you sure that the
>> file is being POSTed to /solr/update/extract . Are you seeing any
>> errors in your Solr logs?
>>
>> Regards,
>> Gora
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message