Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Subject: Re: Having trouble indexing nested docs using "split" feature.
From: David Lee <nightwriter64@comcast.net>
To: solr-user@lucene.apache.org
References: <ac686104-c7ff-f600-99be-2bcc9f5450e2@comcast.net>
Message-ID: <04edec9f-9c7d-4d6b-5952-4f9b4e6e4d2b@comcast.net>
Date: Sat, 2 Dec 2017 14:06:57 -0600
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <ac686104-c7ff-f600-99be-2bcc9f5450e2@comcast.net>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
archived-at: Sat, 02 Dec 2017 20:07:13 -0000

Sorry about the formatting for the first part, hope this is clearer:

{
     "book_id": "1234",
     "book_title": "The Martian Chronicles",
     "author": "Ray Bradbury",
     "reviews": [
         {
             "reviewer": "John Smith",
             "reviewer_background": {
                 "highest_rank": "Excellent",
                 "latest_review": "10/15/2017 10:15:00.000 CST",
             }
         }, {
             "reviewer": "Adam Smith",
             "reviewer_background": {
                 "highest_rank": "Good",
                 "latest_review": "10/10/2017 16:18:00.000 CST",
             }
         }
     ],
     "checkouts": [
         {
             "member_id": "aaabbbccc",
             "member_name": "Sam Jackson"
         },{
             "member_id": "bbbcccddd",
             "member_name": "Buddy Jones"
         }
     ]
}


On 12/2/2017 1:55 PM, David Lee wrote:
> Hi all,
>
> I've been trying for some time now to find a suitable way to deal with 
> json documents that have nested data. By suitable, I mean being able 
> to index them and retrieve them so that they are in the same structure 
> as when indexed.
>
> I'm using version 7.1 under linux Mint 18.3 with Oracle Java 
> 1.8.0_151. After untarring the distribution, I ran through the 
> "getting started" tutorial from the reference manual where it had me 
> create the techproducts index. I then created another collection 
> called my_collection so I could run the examples more easily. It used 
> the _default schema.
>
> Here is a sample:
>
> {
>
>     "book_id": "1234",     "book_title": "The Martian Chronicles",     
> "author": "Ray Bradbury", "reviews": [         { "reviewer": "John 
> Smith",             "reviewer_background": {                 
> "highest_rank": "Excellent", "latest_review": "10/15/2017 10:15:00.000 
> CST",             }         }, {             "reviewer": "Adam Smith", 
> "reviewer_background": {             "highest_rank": "Good", 
>             "latest_review": "10/10/2017 16:18:00.000 CST",         } 
>     } ], "checkouts": [ { "member_id": "aaabbbccc", "member_name": 
> "Sam Jackson" },{ "member_id": "bbbcccddd",           "member_name": 
> "Buddy Jones"       }   ] }
>
> Obviously, I'll need to search at the parent level and child level. I 
> started experimenting and tried to use one of the examples from 
> "Transforming and Indexing Solr JSON". However, when I tried the first 
> example as follows:
>
> curl 'http://localhost:8983/solr/my_collection/update/json/docs'\
>> '?split=/exams'\
>> '&f=first:/first'\
>> '&f=last:/last'\
>> '&f=grade:/grade'\
>> '&f=subject:/exams/subject'\
>> '&f=test:/exams/test'\
>> '&f=marks:/exams/marks'\
>>   -H 'Content-type:application/json' -d '
>> {
>>    "first": "John",
>>    "last": "Doe",
>>    "grade": 8,
>>    "exams": [
>>      {
>>        "subject": "Maths",
>>        "test"   : "term1",
>>        "marks"  : 90},
>>      {
>>        "subject": "Biology",
>>        "test"   : "term1",
>>        "marks"  : 86}
>>    ]
>> }'
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":798}}
>
> Though the status indicates there was no error, when I try to query on 
> the the data using *:*, I get this:
>
> curl 'http://localhost:8983/solr/my_collection/select?q=*:*'
> {
>   "responseHeader":{
>     "zkConnected":true,
>     "status":0,
>     "QTime":6,
>     "params":{
>       "q":"*:*"}},
>   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
>   }}
>
> So it looks like no documents were actually indexed from above. I'm 
> trying to determine if this is due to an error in the reference 
> manual, or if I haven't set up Solr correctly.
>
> I've tried other techniques (not using the split option) like from 
> Yonik's site, but those are slightly dated and I was hoping there was 
> a more practical approach with the release of Solr 7.
>
> Any assistance would be appreciated.
>
> Thank you.
>
>
>
>
>