lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marek Ščevlík <mscev...@codenameprojects.com>
Subject Re: Data Import Request Handler isolated into its own project - any suggestions?
Date Sat, 26 Nov 2016 19:03:51 GMT
Actually to be honest I realized that I only needed to trigger a data
import handler from a jar file. Previously this was done in earlier
versions via the SolrServer object. Now I am thinking if this is OK?:

String urlString1 = "http://localhost:8983/solr/";
SolrClient solr1 = new HttpSolrClient.Builder(urlString).build();
			
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("db","/dataimport");
params.set("command", "full-import");
System.out.println(params.toString());
QueryResponse qresponse1 = solr1.query(params);

System.out.println("response = " + qresponse1);

Output i get from this is: response =
{responseHeader={status=0,QTime=0,params={wt=javabin,version=2,db=/dataimport,command=full-import}},response={numFound=0,start=0,docs=[]}}

There is a core db which come with the examples in solr 6.3 package. It is
loaded. From web ui admin I can operate it a run the dih reindex process.

I wonder whether this could work ? What do you think? I am trying to call
DIH whilst solr is running. This code is in a separate jar file that is run
besides solr instance.

This so far is not working for me. And I wonder why? What do you think?
Should this work at all? OR perhaps someone else could help out.


Thanks anyone for any help.
========================

2016-11-25 19:50 GMT+01:00 Marek Ščevlík <mscevlik@codenameprojects.com>:

> I forgot to mention I am creating a jar file beside of a running solr 6.3
> instance to which I am hoping to attach with java via the
> SolrDispatchFilter to get at the cores and so then I could work with data
> in code.
>
>
> 2016-11-25 19:31 GMT+01:00 Marek Ščevlík <mscevlik@codenameprojects.com>:
>
>> Hi Daniel. Thanks for a reply. I wonder is it now still possibly with
>> release of Solr 6.3 to get hold of a running instance of the jetty server
>> that is part of the solution? I found some code for previous versions where
>> it was captured with this code and one could then obtain cores for a
>> running solr instance ...
>>
>> SolrDispatchFilter solrDispatchFilter = (SolrDispatchFilter) jetty
>>
>> .getDispatchFilter().getFilter();
>>
>>
>> I was trying to implement it this way but that is not working out very
>> well now. I cant seem to get the jetty server object for the running
>> instance. I tried several combinations but none seemed to work.
>>
>> Can you perhaps point me in the right direction?
>>
>> Perhaps you may know more than I do at the moment.
>>
>>
>> Any help would be great.
>>
>>
>> Thanks a lot
>> Regards Marek Scevlik
>>
>>
>>
>> 2016-11-18 15:53 GMT+01:00 Davis, Daniel (NIH/NLM) [C] <
>> daniel.davis@nih.gov>:
>>
>>> Marek,
>>>
>>> I've wanted to do something like this in the past as well.  However, a
>>> rewrite that supports the same XML syntax might be better.   There are
>>> several problems with the design of the Data Import Handler that make it
>>> not quite suitable:
>>>
>>> - Not designed for Multi-threading
>>> - Bad implementation of XPath
>>>
>>> Another issue is that one of the big advantages of Data Import Handler
>>> goes away at this point, which is that it is hosted within Solr, and has a
>>> UI for testing within the Solr admin.
>>>
>>> A better open-source Java solution might be to connect Solr with Apache
>>> Camel - http://camel.apache.org/solr.html.
>>>
>>> If you are not tied absolutely to pure open-source, and freemium
>>> products will do, then you might look at Pentaho Spoon and Kettle.
>>>  Although Talend is much more established in the market, I find Pentaho's
>>> XML-based ETL a bit easier to integrate as a developer, and unit test and
>>> such.   Talend does better when you have a full infrastructure set up, but
>>> then the attention required to unit tests and Git integration seems over
>>> the top.
>>>
>>> Another powerful way to get things done, depending on what you are
>>> indexing, is to use LogStash and couple that with Document processing
>>> chains.   Many of our projects benefit from having a single RDBMS view,
>>> perhaps a materialized view, that is used for the index.   LogStash does
>>> just fine here, pulling from the RDBMS and posting each row to Solr.  The
>>> hierarchical execution of Data Import Handler is very nice, but this can
>>> often be handled on the RDBMS side by creating a view, maybe using
>>> functions to provide some rows.   Many RDBMS systems also support
>>> federation and the import of XML from files, so that this brings XML
>>> processing into the picture.
>>>
>>> Hoping this helps,
>>>
>>> Dan Davis, Systems/Applications Architect (Contractor),
>>> Office of Computer and Communications Systems,
>>> National Library of Medicine, NIH
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Marek Ščevlík [mailto:mscevlik@codenameprojects.com]
>>> Sent: Friday, November 18, 2016 9:29 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Data Import Request Handler isolated into its own project - any
>>> suggestions?
>>>
>>> Hello. My name is Marek Scevlik.
>>>
>>>
>>>
>>> Currently I am working for a small company where we are interested in
>>> implementing your Sorl 6.3 search engine.
>>>
>>>
>>>
>>> We are hoping to take out from the original source package the Data
>>> Import Request Handler into its own project and create a usable .jar file
>>> out of it.
>>>
>>>
>>>
>>> It should then serve as tool that would allow to connect to a remote
>>> server and return data for us to our other application that would use the
>>> returned data.
>>>
>>>
>>>
>>> What do you think? Would anything like this possible? To isolate out the
>>> Data Import Request Handler into its own standalone project?
>>>
>>>
>>>
>>> If we could achieve this we won’t mind to share with the community this
>>> new feature.
>>>
>>>
>>>
>>> I realize this is a first email and may lead into several hundreds so
>>> for the start my request is very simple and not so high level detailed but
>>> I am sure you realize it may lead into being quite complex.
>>>
>>>
>>>
>>> So I wonder if anyone replies.
>>>
>>>
>>>
>>> Thanks a lot for any replies and further info or guidance.
>>>
>>>
>>>
>>>
>>>
>>> Thanks.
>>>
>>> Regards Marek Scevlik
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message