lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrea Gazzarini <a.gazzar...@sease.io>
Subject Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
Date Thu, 29 Aug 2019 21:17:56 GMT
I remember ZK coordinates (hosts, ports and root) are set as system
properties in Solr nodes (please open the admin console and see their
names). So, it would be just a matter of

System.getProperty(ZK ensemble coordinates|root)

Prior to go in that direction: I don't know/remember if there's some ZK
Solr specific class where they can be asked. If that class exists, it would
be a better way, otherwise you can go with the system property approach.

Andrea

On Thu, 29 Aug 2019, 21:32 Arnold Bronley, <arnoldbronley@gmail.com> wrote:

> @Andrea: I agree with you. Do you know if there is a way to initialize
> SolrCloudClient directly from some information that I get
> from SolrQueryRequest or from AddUpdateCommand object?
>
> @Erick: Thank you for the information about
> StatelessScriptUpdateProcessorFactory.
>
> "In your situation, add this _before_ the update is distributed and instead
> of
> coreB, ask for collectionB."
>
> Right, but how do I ask for for collectionB?
>
> "Next, you want to get the value from “coreB”. Don’t do that, get it from
> _collection_ B."
>
> Right, but how do I get value _collection_B?
>
>
>
> On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson <erickerickson@gmail.com>
> wrote:
>
> > Have you looked at using one of the update processors?
> >
> > Consider StatelessScriptUpdateProcessorFactory for instance. You can do
> > anything
> > you’d like to do in a script (Groovy, Postscript. Python I think, and
> > others). See:
> > ./example/files/conf/update-script.js for one example.
> >
> > You put it in your solrconfig file in the update handler, then put the
> > script in your
> > conf directory and push it to ZK and the rest is automagical.
> >
> > There are a bunch of other update processors that you can use that are
> also
> > pretty much by configuration, but the one I referenced is the one that is
> > the
> > most general-purpose.
> >
> > In your situation, add this _before_ the update is distributed and
> instead
> > of
> > coreB, ask for collectionB.
> >
> > Distributed updates go like this:
> > 1. the doc gets routed to a leader for a shard
> > 2. the doc gets forwarded to each replica.
> >
> > Now, depending on where you put the update processor (and you’ll have to
> > dig a bit. Much of this distribution logic is implicit, but you can
> > explicitly
> > define it in solrconfig.xml), this either happens  _before_ the docs are
> > sent
> > to the rest of the replicas or _after_ the docs arrive at each replica.
> > From what
> > you’ve described, you want to do this before distribution so all copies
> > have
> > the new field. You don’t care what replica is the leader. You don’t care
> > how many
> > other replicas exist or where they are. You don’t even care if there’s
> any
> > replica hosting this particular collection on the node that does this, it
> > happens
> > before distribution.
> >
> > Next, you want to get the value from “coreB”. Don’t do that, get it from
> > _collection_ B. Since you have the doc ID (presumably the <uniqueKey>),
> > using get-by-id instead of a standard query will be very efficient. I can
> > imagine
> > under very heavy load this might introduce too much overhead, but it’s
> > where I’d start.
> >
> > Best,
> > Erick
> >
> > > On Aug 29, 2019, at 1:45 PM, Arnold Bronley <arnoldbronley@gmail.com>
> > wrote:
> > >
> > > I can't use  CloudSolrClient  because I need to intercept the incoming
> > > indexing request and then add one more field to it. All this happens on
> > > Solr side and not client side.
> > >
> > > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini <a.gazzarini@sease.io
> >
> > > wrote:
> > >
> > >> Hi Arnold,
> > >> why don't you use solrj (in this case a CloudSolrClient) instead of
> > dealing
> > >> with such low-level details? The actual location of the document you
> are
> > >> looking for would be completely abstracted.
> > >>
> > >> Best,
> > >> Andrea
> > >>
> > >> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, <arnoldbronley@gmail.com>
> > >> wrote:
> > >>
> > >>> So, here is the problem that I am trying to solve. I am moving from
> > Solr
> > >>> master-slave architecture to SolrCloud architecture. I have one
> custom
> > >> Solr
> > >>> plugin that does following:
> > >>>
> > >>> 1. When a document (say document with unique id doc1)is getting
> indexed
> > >> to
> > >>> a core say core A then this plugin adds one more field to the
> indexing
> > >>> request. It fetches this new field from core B. Core B in our case
> > >>> maintains popularity score field for each document which gets
> > calculated
> > >> in
> > >>> a different project. It fetches the popularity score from score B for
> > >> doc1
> > >>> and adds it to indexing request.
> > >>> 2. In following code, dataInfo.dataSource is the name of the core B.
> > >>>
> > >>> I can use the name of the core B like collection_shard1_replica_n21
> and
> > >> it
> > >>> works. But it is not a good solution. What if I had a multiple shards
> > for
> > >>> core B? In that case the the doc1 that I am trying to find might not
> be
> > >>> present in collection_shard1_replica_n21.
> > >>>
> > >>> So is there something like,
> > >>>
> > >>> SolrCollecton dataCollection = getCollection(dataInfo.dataSource);
> > >>>
> > >>> @Override
> > >>> public void processAdd(AddUpdateCommand cmd) throws IOException {
> > >>>   SolrInputDocument doc = cmd.getSolrInputDocument();
> > >>>   String uniqueId = getUniqueId(doc);
> > >>>
> > >>>   SolrCore dataCore =
> > >>> req.getCore().getCoreContainer().getCore(dataInfo.dataSource);
> > >>>
> > >>>   if (dataCore == null){
> > >>>       LOG.error("Solr core '{}' to use as data source could not be
> > >>> found!  "
> > >>>               + "Please check if it is loaded.",
> dataInfo.dataSource);
> > >>>   } else{
> > >>>
> > >>>          Document sourceDoc = getSourceDocument(dataCore, uniqueId);
> > >>>
> > >>>          if (sourceDoc != null){
> > >>>
> > >>>              populateDocToBeAddedFromSourceDoc(doc,sourceDoc);
> > >>>          }
> > >>>   }
> > >>>
> > >>>   // pass it up the chain
> > >>>   super.processAdd(cmd);
> > >>> }
> > >>>
> > >>>
> > >>> On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson <
> > erickerickson@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> No, you cannot just use the collection name. Replicas are just
> cores.
> > >>>> You can host many replicas of a single collection on a single Solr
> > node
> > >>>> in a single CoreContainer (there’s only one per Solr JVM). If
you
> just
> > >>>> specified a collection name how would the code have any clue which
> > >>>> of the possibilities to return?
> > >>>>
> > >>>> The name is in the form collection_shard1_replica_n21
> > >>>>
> > >>>> How do you know where the doc you’re working on? Put the ID through
> > >>>> the hashing mechanism.
> > >>>>
> > >>>> This isn’t the same at all if you’re running stand-alone, then
> there’s
> > >>> only
> > >>>> one name.
> > >>>>
> > >>>> But as I indicated above, your ask for just using the collection
> name
> > >>> isn’t
> > >>>> going to work by definition.
> > >>>>
> > >>>> So perhaps this is an XY problem. You’re asking about getCore,
which
> > is
> > >>>> a very specific, low-level concept. What are you trying to do at
a
> > >> higher
> > >>>> level? Why do you think you need to get a core? What do you want
to
> > >> _do_
> > >>>> with the doc that you need the core it resides in?
> > >>>>
> > >>>> Best,
> > >>>> Erick
> > >>>>
> > >>>>> On Aug 28, 2019, at 5:28 PM, Arnold Bronley <
> arnoldbronley@gmail.com
> > >>>
> > >>>> wrote:
> > >>>>>
> > >>>>> Wait, would I need to use core name like
> > >> collection1_shard1_replica_n4
> > >>>>> etc/? Can't I use collection name? What if  I have multiple
shards,
> > >> how
> > >>>>> would I know where does the document that I am working with
lives
> in
> > >>>>> currently.
> > >>>>> I would rather prefer to use collection name and expect the
core
> > >>>>> information to be abstracted out that way.
> > >>>>>
> > >>>>> On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson <
> > >>> erickerickson@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hmmm, should work. What is your core_name? There’s strings
like
> > >>>>>> collection1_shard1_replica_n4 and core_node6. Are you sure
you’re
> > >>> using
> > >>>> the
> > >>>>>> right one?
> > >>>>>>
> > >>>>>>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley <
> > >> arnoldbronley@gmail.com
> > >>>>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>> Hi,
> > >>>>>>>
> > >>>>>>> In a custom Solr plugin code,
> > >>>>>>> req.getCore().getCoreContainer().getCore(core_name)
is returning
> > >> null
> > >>>>>> even
> > >>>>>>> if core by name core_name is loaded and up in Solr.
req is object
> > >>>>>>> of SolrQueryRequest class. I am using Solr 8.2.0 in
SolrCloud
> mode.
> > >>>>>>>
> > >>>>>>> Any ideas on why this might be the case?
> > >>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message