lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: DIH: Create Child Documents in ScriptTransformer
Date Thu, 19 Sep 2019 06:09:34 GMT
I fully agree. However, I am just curious to see the limits.

> Am 18.09.2019 um 23:33 schrieb Erick Erickson <erickerickson@gmail.com>:
> 
> When it starts getting complex, I usually move to SolrJ. You say
> you're loading documents, so I assume Tika is in the mix too.
> 
> Here's a blog on the topic so you an see how to get started...
> 
> https://lucidworks.com/post/indexing-with-solrj/
> 
> Best,
> Erick
> 
>> On Wed, Sep 18, 2019 at 2:56 PM Jörn Franke <jornfranke@gmail.com> wrote:
>> 
>> Hi,
>> 
>> I load a set of documents. Based on these documents some logic needs to be
>> applied to split them into chapters (this is done). One whole document is
>> loaded as a parent. Chapters of the whole document + metadata should be
>> loaded as child documents of this parent.
>> I want to now collect information on how this can be done:
>> * Use a custom loader - this is possible and works
>> * Use DIH and extract the chapters in a ScriptTransformer and add them as
>> child documents there. However, the scripttransformer receives as input
>> only a HashMap and while it works to transform field values etc. It does
>> not seem possible to add childdocuments within the DIH scripttransformer. I
>> tried adding a JavaArray with SolrInputDocuments, but this does not seem to
>> work. I see in debug/verbose mode that indeed the transformer adds them to
>> the HashMap correctly, but they don't end up in the document. Maybe here it
>> could be possible somehow via nested entities?
>> * Use DIH+ an UpdateProcessor (Script): there i get the SolrInputDocument
>> as a parameter and it seems feasible to extract chapters and add them as
>> child documents.
>> 
>> thank you.
>> 
>> best regards

Mime
View raw message