lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Solr <ravis...@gmail.com>
Subject Re: bulk reindexing 5.3.0 issue
Date Sun, 27 Sep 2015 00:01:06 GMT
Erick & Shawn I incrporated your suggestions.


0. Shut off all other indexing processes.
1. As Shawn mentioned set batch size to 10000.
2. Loved Erick's suggestion about not using filter at all and sort by
uniqueId and put last known uinqueId as next queries start while still
using cursor marks as follows

SolrQuery q = new SolrQuery("+uuid:sun.org.mozilla* +uniqueId:{" +
markerSysId + " TO
*]").setRows(10000).addSort("uniqueId",ORDER.asc).setFields(new
String[]{"uniqueId","uuid"});
q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);

3. As per Shawn's advise commented autocommit and soft commit in
solrconfig.xml and set openSearcher to false and issued MANUAL COMMIT for
every batch from code as follows

client.commit(true, true, true);

Here is what the log statement & results - log.info("Indexed " + count +
"/" + docList.getNumFound());


2015-09-26 17:29:57 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 90000/1344085
2015-09-26 17:30:30 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 100000/1334085
2015-09-26 17:33:26 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 110000/1324085
2015-09-26 17:36:09 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 120000/1314085
2015-09-26 17:39:42 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 130000/1304085
2015-09-26 17:43:05 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 140000/1294085
2015-09-26 17:46:14 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 150000/1284085
2015-09-26 17:48:22 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 160000/1274085
2015-09-26 17:48:25 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 160000/0
2015-09-26 17:48:25 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!

Ran manually a second time to see if first was fluke. Still same.

2015-09-26 17:55:26 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 10000/1264716
2015-09-26 17:58:07 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 20000/1254716
2015-09-26 18:03:09 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 30000/1244716
2015-09-26 18:06:32 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 40000/1234716
2015-09-26 18:10:35 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 50000/1224716
2015-09-26 18:15:23 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 60000/1214716
2015-09-26 18:15:24 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 60000/0
2015-09-26 18:15:26 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!

Now changed the autommit in solrconfig.xml as follows...Note the soft
commit has been shut off as per Shawn's advise

    <autoCommit>
       <!-- <maxDocs>100</maxDocs> -->
       <maxTime>300000</maxTime>
     <openSearcher>false</openSearcher>
    </autoCommit>

  <!--
    <autoSoftCommit>
        <maxTime>30000</maxTime>
    </autoSoftCommit>
  -->

2015-09-26 18:47:44 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 10000/1205451
2015-09-26 18:50:49 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 20000/1195451
2015-09-26 18:54:18 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 30000/1185451
2015-09-26 18:57:04 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 40000/1175451
2015-09-26 19:00:10 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 50000/1165451
2015-09-26 19:00:13 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 50000/0
2015-09-26 19:00:13 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
FINISHED !!!


The query still returned 0 results when they are over million docs
available which match uuid:sun.org.mozilla* ...Then why do I get 0 ???

Thanks

Ravi Kiran Bhaskar

On Sat, Sep 26, 2015 at 3:49 PM, Ravi Solr <ravisolr@gmail.com> wrote:

> Thank you Erick & Shawn for taking significant time off your weekends to
> debug and explain in great detail. I will try to address the main points
> from your emails to provide more situation context for better understanding
> of my situation
>
> 1. Erick, As part of our upgrade from 4.7.2 to 5.3.0 I re-indexed all docs
> from my old Master-Slave to My SolrCloud using DIH SolrEntityProcessor
> which used a Script Transformer. I unwittingly messed up the script and
> hence this 'uuid' (String Type field) got messed up. All records prior to
> Sep 20 2015 have this issue that I am currently try to rectify.
>
> 2. Regarding openSearcher=true/false, I had it as false all along in my
> 4.7.2 config. I read somewhere that SolrCloud or 5.x doesn't honor it or it
> should be left default (Don't exactly remember where I read it), hence, I
> removed it from my solrconfig.xml going against my intuition :-)
>
> 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially using
> 100 docs batch, which, I later increased to 500 docs per batch. Also it
> would not be a infinite loop if I commit for each batch, right !!??
>
> 4. Shawn, you are correct the uuid is of String Type and its not unique
> key for my schema. My uniqueKey is uniqueId and systemid is of no
> consequence here, it's another field for differentiating apps within my
> solr.
>
> Than you very much again guys. I will incorporate your suggestions and
> report back.
>
> Thanks
>
> Ravi Kiran Bhaskar
>
> On Sat, Sep 26, 2015 at 12:58 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> Oh, one more thing. _assuming_ you can't change the indexing process
>> that gets the docs from the system of record, why not just add an
>> update processor that does this at index time? See:
>> https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors
>> ,
>> in particular the StatelessScriptUpdateProcessorFactory might be a
>> good candidate. It just takes a bit of javascript (or other scripting
>> language) and changes the record before it gets indexed.
>>
>> FWIW,
>> Erick
>>
>> On Sat, Sep 26, 2015 at 9:52 AM, Shawn Heisey <apache@elyograg.org>
>> wrote:
>> > On 9/26/2015 10:41 AM, Shawn Heisey wrote:
>> >> <autoCommit> <maxTime>300000</maxTime> </autoCommit>
>> >
>> > This needs to include openSearcher=false, as Erick mentioned.  I'm sorry
>> > I screwed that up:
>> >
>> >   <autoCommit>
>> >     <maxTime>300000</maxTime>
>> >     <openSearcher>false</openSearcher>
>> >   </autoCommit>
>> >
>> > Thanks,
>> > Shawn
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message