lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajesh Hazari <rajeshhaz...@gmail.com>
Subject solr suggester build issues
Date Mon, 29 Jun 2015 18:21:02 GMT
Solr : 4.9.x , with simple solr cloud on jetty.
JDK 1.7
num of replica : 4 , one replica for each shard
num of shards : 1

Hi All,

I have been facing below issues with solr suggester introduced in 4.7.x. Do
any one have good working solution or

buildOnCommit=true property is suggested not to use with index with more
frequent softcommits as suggested in the documentation
               https://cwiki.apache.org/confluence/display/solr/Suggester
So we have disabled this (buildOnCommit=false) and started using
buildOnOptimize=true, which was not helping us to have latest document
 suggestion (with frequent softcommits),
as hardly there was one optimize each day. (we have default optimize
setting in solrconfig)
So we have disabled buildOnOptimize (buildOnOptimize=false)

As suggested in the documentation, as of now, we came up with cron jobs to
build the suggester for every hour.
These jobs are doing their job, i.e, we are having the latest suggestions
available every hour, below are issues that we have this implementation.

*Issue#1* : Suggest built url i.e,
*http://$solrnode:8983/solr/collection1/suggest?suggest.build=true*  if
issued to one replica of solr cloud does not build suggesters in all of the
replicas in solrcloud.
        Resolution: For which we have separate cron jobs on each of the
solr instance having the build call to build the suggester, below is the
raw pictorial representation of this impl
                            (which is not the best implementation which has
many flaws)


*http://$solrnode:8983/solr/collection1/suggest?suggest.build=true*
*                                                                 |*
*                                                                 |--
suggestcron.job.sh <http://suggestcron.job.sh> (on solr1.aws.instance)*


*http://$solrnode:8983/solr/collection1/suggest?suggest.build=true*
*                                                                 |*
*                                                                 |--
suggestcron.job.sh <http://suggestcron.job.sh> (on solr2.aws.instance)*
*          .......... similar for other solr nodes*
*         We will be coming up with single script to go this for all
collection later.*

we were bit happy that we are having a updated suggester in all of the
instances, *which is not!*

*The issue#2 the suggester built on all solr nodes were not consistent as
the solr core in each solr replica have difference in max-docs and
num-docs *
*(which is quiet normal **with frequent softcommits , when updates mostly
have the same documents updated with different data, **i guess , correct me
if i'm wrong )*

when we query curl -i "http://
$solrnode:8983/solr/liveaodfuture/suggest?q=Nirvana&wt=json&indent=true"

one of the solr node returns
{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "suggest":{
    "AnalyzingSuggester":{
      "Nirvana":{
        "numFound":1,
        "suggestions":[{
            "term":"nirvana",
            "weight":6,
            "payload":""}]}},
    "DictionarySuggester":{
      "Nirvana":{
        "numFound":0,
        "suggestions":[]}}}}

/admin/luke/collection/ call status

"index":{
    "numDocs":90564,
    "maxDoc":94583,
    "deletedDocs":4019,
.......}


while other 3 solr node returns

{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "suggest":{
    "AnalyzingSuggester":{
      "Nirvana":{
        "numFound":2,
        "suggestions":[{
            "term":"nirvana",
            "weight":163,
            "payload":""},
        *  {*
*            "term":"nirvana cover",*
*            "weight":11,*
*            "payload":""}]}},*
    "DictionarySuggester":{
      "Nirvana":{
        "numFound":0,
        "suggestions":[]}}}}

/admin/luke/collection/ call status on other 3 solr nodes... which have
different maxDoc that the above solr node.

"index":{
    "numDocs":90564,
    "maxDoc":156760,
........}

when i check the built time for suggest directory of the collection on each
solr node have the same time

ls -lah /mnt/solrdrive/solr/cores/*/data/suggest_analyzing/*
-rw-r--r-- 1 root root 3.0M May 20 16:00
/mnt/solrdrive/solr/cores/collection1_shard1_replica3/data/suggest_analyzing/wfsta.bin

Questions:
            Does the suggester built url i.e,
*http://$solrnode:8983/solr/collection1/suggest?suggest.build=true
*consider maxdocs or deleted docs also?
          Does the suggester built from  i.e,
*solr/collection1/suggest?suggest.build=true
*is different from buildOnCommit=true property ?
           Do any one have better solution to keep the suggester current
with contents in the index with more frequent softcommits?

           Does solr have any component like scheduler like cron scheduler
to schedule the suggest build and
             scheduling the optimize on daily basis ?


*Thanks,*
*Rajesh**.*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message