Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 16F3B11E7C for ; Mon, 25 Aug 2014 20:30:35 +0000 (UTC) Received: (qmail 14003 invoked by uid 500); 25 Aug 2014 20:30:30 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 13926 invoked by uid 500); 25 Aug 2014 20:30:30 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 13914 invoked by uid 99); 25 Aug 2014 20:30:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Aug 2014 20:30:30 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of SRS0=LvIGVQ=5T=basetechnology.com=jack@yourhostingaccount.com designates 65.254.254.73 as permitted sender) Received: from [65.254.254.73] (HELO walmailout05.yourhostingaccount.com) (65.254.254.73) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Aug 2014 20:30:24 +0000 Received: from mailscan14.yourhostingaccount.com ([10.1.15.14] helo=walmailscan14.yourhostingaccount.com) by walmailout05.yourhostingaccount.com with esmtp (Exim) id 1XM0tw-0006R5-79 for solr-user@lucene.apache.org; Mon, 25 Aug 2014 16:30:04 -0400 Received: from [10.114.3.33] (helo=walimpout13) by walmailscan14.yourhostingaccount.com with esmtp (Exim) id 1XM0tw-0004WM-5Z for solr-user@lucene.apache.org; Mon, 25 Aug 2014 16:30:04 -0400 Received: from walauthsmtp11.yourhostingaccount.com ([10.1.18.11]) by walimpout13 with id j8W11o0010EKrUA018W4yk; Mon, 25 Aug 2014 16:30:04 -0400 X-Authority-Analysis: v=2.1 cv=M9jDKkAs c=1 sm=1 tr=0 a=5bnIr+R+vs56oWgm0tidcA==:117 a=UkMH5KcvGpXfM81wB0t8ug==:17 a=pq4jwCggAAAA:8 a=OF-CdTOGAAAA:8 a=aQzbgH187woA:10 a=2PM5rUoJEWoA:10 a=3jZET7lWBKwA:10 a=IkcTkHD0fZMA:10 a=jvYhGVW7AAAA:8 a=OA2lqS22AAAA:8 a=mV9VRH-2AAAA:8 a=7AR_Tnopc7n5-OYIeTEA:9 a=QEXdDO2ut3YA:10 a=EMlJoiak7gQA:10 Received: from 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com ([207.237.113.28]:63895 helo=JackKrupansky14) by walauthsmtp11.yourhostingaccount.com with esmtpa (Exim) id 1XM0ts-000374-Vb for solr-user@lucene.apache.org; Mon, 25 Aug 2014 16:30:00 -0400 Message-ID: From: "Jack Krupansky" To: References: <1B421990FA514BDDA5FD36AA255378B9@JackKrupansky14> <9A35E98C-A01E-4AB8-A8D4-53C43A486981@gmail.com> <2FBB829491354CDAAD80D471FA5980D6@JackKrupansky14> <9753DD53-5C59-4278-8C28-8453F914FE2D@gmail.com> In-Reply-To: Subject: Re: embedded documents Date: Mon, 25 Aug 2014 16:29:59 -0400 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3528.331 X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3528.331 X-EN-UserInfo: e0a4b55451ed9f27313ebf02e3d4348d:931c98230c6409dcc37fa7e93b490c27 X-EN-AuthUser: jack@basetechnology.com Sender: "Jack Krupansky" X-EN-OrigIP: 207.237.113.28 X-EN-OrigHost: 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com X-Virus-Checked: Checked by ClamAV on apache.org And a comparison to Elasticsearch would be helpful, since ES gets a lot of mileage from their super-easy JSON support. IOW, how much of the ES "advantage" is eliminated. -- Jack Krupansky -----Original Message----- From: Noble Paul Sent: Monday, August 25, 2014 1:59 PM To: solr-user@lucene.apache.org Subject: Re: embedded documents The simplest use case is to dump the entire json using split=/&f=/** . i am planning to add an alias for the same (SOLR-6343) . The nested docs is missing now and we will need to add it. A ticket needs to be opened On Mon, Aug 25, 2014 at 6:45 AM, Jack Krupansky wrote: > Thanks, Erik, but... I've read that Jira several times over the past > month, it is is far too cryptic for me to make any sense out of what it is > really trying to do. A simpler approach is clearly needed. > > My perception of SOLR-6304 is not that it indexes a single JSON object as > a single Solr document, but that it generates a collection of separate > documents, somewhat analogous to Lucene block/child documents, but... not > quite. > > I understood the request on this message thread to be the flattening of a > single nested JSON object to a single Solr document. > > IMHO, we need to be trying to make Solr more automatic and more > approachable, not an even more complicated "toolkit". > > -- Jack Krupansky > > -----Original Message----- From: Erik Hatcher > Sent: Monday, August 25, 2014 9:32 AM > > To: solr-user@lucene.apache.org > Subject: Re: embedded documents > > Jack et al - there’s now this, which is available in the any-minute > release of Solr 4.10: https://issues.apache.org/jira/browse/SOLR-6304 > > Erik > > On Aug 25, 2014, at 5:01 AM, Jack Krupansky > wrote: > > That's a completely different concept, I think - the ability to return a >> single field value as a structured JSON object in the "writer", rather >> than >> simply "loading" from a nested JSON object and distributing the key >> values >> to normal Solr fields. >> >> -- Jack Krupansky >> >> -----Original Message----- From: Bill Bell >> Sent: Sunday, August 24, 2014 7:30 PM >> To: solr-user@lucene.apache.org >> Subject: Re: embedded documents >> >> See my Jira. It supports it via json.fsuffix=_json&wt=json >> >> http://mail-archives.apache.org/mod_mbox/lucene-dev/ >> 201304.mbox/%3CJIRA.12641293.1365394604231.125944.1365397875874@arcas%3E >> >> Bill Bell >> Sent from mobile >> >> >> On Aug 24, 2014, at 6:43 AM, "Jack Krupansky" >>> wrote: >>> >>> Indexing and query of raw JSON would be a valuable addition to Solr, so >>> maybe you could simply explain more precisely your data model and >>> transformation rules. For example, when multi-level nesting occurs, what >>> does your loader do? >>> >>> Maybe if the fielld names were derived by concatenating the full path of >>> JSON key names, like titles_json.FR, field_naming nesting could be >>> handled >>> in a fully automated manner. >>> >>> I had been thinking of filing a Jira proposing exactly that, so that >>> even the most deeply nested JSON maps could be supported, although >>> combinations of arrays and maps would be problematic. >>> >>> -- Jack Krupansky >>> >>> -----Original Message----- From: Michael Pitsounis >>> Sent: Wednesday, August 20, 2014 7:14 PM >>> To: solr-user@lucene.apache.org >>> Subject: embedded documents >>> >>> Hello everybody, >>> >>> I had a requirement to store complicated json documents in solr. >>> >>> i have modified the JsonLoader to accept complicated json documents with >>> arrays/objects as values. >>> >>> It stores the object/array and then flatten it and indexes the fields. >>> >>> e.g basic example document >>> >>> { >>> "titles_json":{"FR":"This is the FR title" , "EN":"This is the EN >>> title"} , >>> "id": 1000003, >>> "guid": "3b2f2998-85ac-4a4e-8867-beb551c0b3c6" >>> } >>> >>> It will store titles_json:{"FR":"This is the FR title" , "EN":"This is >>> the >>> EN title"} >>> and then index fields >>> >>> titles.FR:"This is the FR title" >>> titles.EN:"This is the EN title" >>> >>> >>> Do you see any problems with this approach? >>> >>> >>> >>> Regards, >>> Michael Pitsounis >>> >> >> -- ----------------------------------------------------- Noble Paul