lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alisa Z. <prol...@mail.ru>
Subject Re: Solr-5.5.0 doesn't recognize difefrent types of _childDocuments_ any more --degrading since 5.3.1?
Date Fri, 25 Mar 2016 22:48:08 GMT
 Further experiments:

-- updated the schema to account for multiple values: 

curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-dynamic-field":{
     "name":"*type_s",
     "type":"string",
     "indexed":true, 
     "multiValued":true
 }
}' http://localhost:8985/solr/my_collection/schema

-- Re-ran indexing again: 
solr-5.5.0$ bin/post -c my_collection ../../data/data-solr.json -p 8985
java -classpath /Users/<omitted>/solr-5.5.0/dist/solr-core-5.5.0.jar -Dauto=yes -Dport=8985
-Dc=enron_path_w_ts -Ddata=files org.apache.solr.util.SimplePostTool ../../data/data-solr.json
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8985/solr/my_collection/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file data-solr-path-w-ts-suffix.json (application/json) to [base]/json/docs
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8985/solr/my_collection/update/json/docs
SimplePostTool: WARNING: Response: {"responseHeader":{"status":400,"QTime":12},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"ERROR:
[doc=AVNzOoBsX6g-H6sC3dgo] multiple values encountered for non multiValued field  _childDocuments_._childDocuments_._childDocuments_.relevance_tf:
[0.918377, 0.737646, 0.700964, 0.659539, 0.657294, 0.62809, 0.612241, 0.609963, 0.873428,
0.764, 0.763825, 0.552016, 0.472819, 0.30331, 0.292935, 0.285799, 0.278851, 0.936158, 0.790093,
0.722639, 0.649841, 0.576905, 0.570454, 0.445547, 0.429439, 0.410347, 0.391091, 0.293075,
0.253883, 0.252494, 0.250084, 0.242866, 0.24142, 0.239883, 0.239827, 0.239563, 0.239507, 0.238434,
0.238193, 0.237804, 0.237769, 0.237022, 0.236955, 0.2364, 0.236164, 0.236129, 0.236025, 0.235973]","code":400}}
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned
HTTP response code: 400 for URL: http://localhost:8985/solr/my_collection/update/json/docs
1 files indexed.
COMMITting Solr index changes to http://localhost:8985/solr/my_collection/update...
Time spent: 0:00:05.137

So now it dumps all the values of  relevance_tf into one array  disregarding the type of the
nested field they actually belonged... It really does not seem to account for proper hierarchy
handling with branches of different types.  :(  

-- Alisa 


>Пятница, 25 марта 2016, 18:19 -04:00 от Alisa Z. <proloxx@mail.ru>:
>
>Hi all, 
>It is partially a question, partially a discussion. 
>I am working with documents with deep levels of nesting. The documents are in a single
JSON file (see a sample below).
>
>When I was on Solr 5.3.1, 
>solr-5.3.1$ bin/post -c my_collection ../data/data-solr.json
>caused no problems.
>
>Now, I am trying to run just the the same on Solr-5.5.0: 
>
>solr-5.5.0$ bin/post -c my_collection ../data/data-solr.json
>java -classpath /Users/<omitted>/solr-5.5.0/dist/solr-core-5.5.0.jar -Dauto=yes
-Dc=enron_path_w_ts -Ddata=files org.apache.solr.util.SimplePostTool ../data/data-solr.json
>SimplePostTool version 5.0.0
>Posting files to [base] url  http://localhost:8983/solr/my_collection/update ...
>Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
>POSTing file data-solr.json (application/json) to [base]/json/docs
>SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/my_collection/update/json/docs
>SimplePostTool: WARNING: Response: {"responseHeader":{"status":400,"QTime":5},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"ERROR:
[doc=AVNzOoBsX6g-H6sC3dgo] multiple values encountered for non multiValued field _childDocuments_._childDocuments_.type_s:
[doc.userData.parts, doc.enriched.text]","code":400}}
>SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server
returned HTTP response code: 400 for URL: http://localhost:8983/solr/my_collection/json/docs
>1 files indexed.
>COMMITting Solr index changes to  http://localhost:8983/solr/my_collection/update ..  .
>Time spent: 0:00:05.078
>
>So obviously I don't get my collection uploaded and indexed properly anymore.   
>
>The question is: 
> - What to do?  
>
>The discussion is: 
>- Is it a proper behavior?  It used to be smooth on Solr 5.3.1: I did not need to know
how many levels of nesting do I exactly have and specify whether the _childDocuments_ were
of the same type or not. 
> 
>
>A partial sample follows: 
>
>[
>    {
>        "type_s": "doc",
>        "_childDocuments_": [
>            {
>                "type_s": "doc.userData",
>                "Mime-Version_t": "1.0",
>                "_childDocuments_": [
>                    {
>                        "type_s": "doc.userData.parts",
>                        "id": "AVNzOoBsX6g-H6sC3dgo-userData-23461"
>                        "content_t": "----- SOMETEXT",
>                        "id": "AVNzOoBsX6g-H6sC3dgo-parts-15557",
>                        "contentType_t": "text/plain"
>                    }
>                ],
>                "Content-Transfer-Encoding_t": "7bit",
>            },
>            {
>                "type_s": "doc.enriched",
>                "_childDocuments_": [
>                    {
>                       "type_s": "doc.enriched.text",
>                        "language_t": "english",
>                        "_childDocuments_": [
>                            {
>                                "type_s": "doc.enriched.text.docSentiment",
>                                "id": "AVNzOoBsX6g-H6sC3dgo-docSentiment-17692",
>                                "type_t": "positive"
>                            },
>                            {
>                                "type_s": "doc.enriched.text.taxonomy",
>                                "label_t": "/business",
>                                "id": "AVNzOoBsX6g-H6sC3dgo-taxonomy-12728"
>                            },
>                           {
>                                "type_s": "doc.enriched.text.concepts",
>                                "id": "AVNzOoBsX6g-H6sC3dgo-concepts-98530",
>                                "text_t": "Stephen",
>                                "_childDocuments_": [
>                                    {
>                                        "type_s":
"doc.enriched.text.concepts.knowledgeGraph",
>                                        "id": "AVNzOoBsX6g-H6sC3dgo-knowledgeGraph-20811",
>                                        "typeHierarchy_t":
"/people/children/stephen"
>                                    }
>                                ]
>                            },
>                            {
>                               "type_s": "doc.enriched.text.concepts",                             

>                                "id": "AVNzOoBsX6g-H6sC3dgo-concepts-12396",
>                                "text_t": "Thought",
>                                "_childDocuments_": [
>                                    {
>                                        "type_s":
"doc.enriched.text.concepts.knowledgeGraph",
>                                        "id": "AVNzOoBsX6g-H6sC3dgo-knowledgeGraph-20316",
>                                        "typeHierarchy_t":
"/people/ideas/thought"
>                                    }
>                                ]
>                            }, 
>                            ...
>                          }]
>     },
>{"type_s": "doc", ....
>},
>...
>]
>
>
>Thank you for your consideration,
>-- 
>Alisa Zhila
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message