lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yandong Yao <yydz...@gmail.com>
Subject Index optimize takes more than 40 minutes for 18M documents
Date Thu, 21 Feb 2013 17:20:40 GMT
Hi Guys,

I am using Solr 4.1 and have indexed 18M documents using solrj
ConcurrentUpdateSolrServer (each document contains 5 fields, and average
length is less than 1k).

1) It takes 70 minutes to index those documents without optimize on my mac
10.8, how is the performance, slow, fast or common?

2) It takes about 40 minutes to optimize those documents, following is top
output, and there are lots of FAULTS, what does this means?

Processes: 118 total, 2 running, 8 stuck, 108 sleeping, 719 threads

       00:56:52
Load Avg: 1.48, 1.56, 1.73  CPU usage: 6.63% user, 6.40% sys, 86.95% idle
SharedLibs: 31M resident, 0B data, 6712K linkedit.
MemRegions: 34734 total, 5801M resident, 39M private, 638M shared. PhysMem:
982M wired, 3600M active, 3567M inactive, 8150M used, 38M free.
VM: 254G vsize, 1285M framework vsize, 1469887(368) pageins, 1095550(0)
pageouts.  Networks: packets: 14842595/9661M in, 14777685/9395M out.
Disks: 820048/43G read, 523814/53G written.

PID   COMMAND      %CPU  TIME     #TH  #WQ  #POR #MRE RPRVT  RSHRD  RSIZE
 VPRVT  VSIZE  PGRP PPID STATE   UID  FAULTS   COW  MSGSENT  MSGRECV SYSBSD
   SYSMACH
4585  java         11.7  02:52:01 32   1    483  342  3866M+ 6724K  3856M+
4246M  6908M  4580 4580 sleepin 501  1490340+ 402  3000781+ 231785+
15044055+ 10033109+

3) If I don't run optimize, what is the impact? bigger disk size or slow
query performance?

Following is my index config in  solrconfig.xml:

<ramBufferSizeMB>100</ramBufferSizeMB>
<mergeFactor>10</mergeFactor>
<autoCommit>
       <maxDocs>100000</maxDocs>    <!-- 100K docs -->
       <maxTime>300000</maxTime>    <!-- 5 minutes -->
       <openSearcher>false</openSearcher>
</autoCommit>

Thanks very much in advance!

Regards,
Yandong

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message