lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Khludnev (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-3585) processing updates in multiple threads
Date Sun, 08 Jul 2012 19:19:35 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408872#comment-13408872
] 

Mikhail Khludnev edited comment on SOLR-3585 at 7/8/12 7:18 PM:
----------------------------------------------------------------

Dmitry,

I've took 3M rows tsv from http://www.freebase.com/view/book/book_edition

slightly updated solr 4.0 examples config to allow concurrency (see patch from report.tar.gz)

in report.tar.gz you can see rate of  utilization in iostat outputs. 

summary: 
on MacBookPro core i5
233/183/138 sec for 1/2/4 threads.
3M records, index size is slightly less than 1 G 

single thread (solr as-is)
    KB/t tps  MB/s     KB/t tps  MB/s  us sy id   1m   5m   15m
 1024.00   6  5.99     0.00   0  0.00  36  2 62  2.62 2.16 2.10

233756 millis

Jul 8, 2012 11:41:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] webapp=/solr path=/update params={commit=true&Zupdate.chain=threads&Zbacking.chain=logrun&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv}
{add=[/m/08s9170, /m/08s7myj, /m/08s7nfb, /m/08s912p, /m/08s7nqy, /m/08s7rkg, /m/08s7vmn,
/m/08s7yzd, /m/08s7zlw, /m/08s7zw3, ... (3401073 adds)],commit=} 0 233756 

two threads:

          disk0           disk2       cpu     load average
    KB/t tps  MB/s     KB/t tps  MB/s  us sy id   1m   5m   15m
  104.09 128 13.01     0.00   0  0.00  46  6 48  4.53 2.94 2.30

183157 millis

Jul 8, 2012 11:25:58 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] webapp=/solr path=/update params={backing.chain=logrun&commit=true&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv&update.chain=threads}
{add=[/m/08s7myj, /m/08s7nfb, /m/08s912p, /m/08s7rkg, /m/08s7zlw, /m/08s8127, /m/08s8wx0,
/m/08s8_cg, /m/08s8cd2, /m/08s8wjv, ... (1658583 adds)]} 0 183157
Jul 8, 2012 11:25:58 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] {add=[/m/08s9170, /m/08s7nqy, /m/08s7vmn, /m/08s7yzd, /m/08s7zw3, /m/08s82t3,
/m/08s8dcy, /m/08s8hnz, /m/08s8j3x, /m/08s8mfs, ... (1742490 adds)]} 0 183157

four threads

          disk0           disk2       cpu     load average
    KB/t tps  MB/s     KB/t tps  MB/s  us sy id   1m   5m   15m
   91.19 134 11.91     0.00   0  0.00  93  5  2  5.29 3.13 2.51

138413 millis

Jul 8, 2012 11:53:27 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] webapp=/solr path=/update params={backing.chain=logrun&commit=true&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv&update.chain=threads}
{add=[/m/08s912p, /m/08s7yzd, /m/08s82t3, /m/08s8wjv, /m/08s8nx4, /m/08s8txn, /m/08z05sg,
/m/08z05jm, /m/08yzqg0, /m/08yzkh2, ... (949997 adds)]} 0 138413
Jul 8, 2012 11:53:27 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] {add=[/m/08s9170, /m/08s7nqy, /m/08s7vmn, /m/08s8127, /m/08s8wx0, /m/08s8dcy,
/m/08s8mfs, /m/08s8v_7, /m/08z06nt, /m/08z05c5, ... (848935 adds)]} 0 138413
Jul 8, 2012 11:53:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] {add=[/m/08s7nfb, /m/08s7zw3, /m/08s8_cg, /m/08s8j3x, /m/08s8szc, /m/08z09lt,
/m/08yzf7_, /m/08yz2b1, /m/08yz24n, /m/08yyz1r, ... (777467 adds)]} 0 138413
Jul 8, 2012 11:53:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] {add=[/m/08s7myj, /m/08s7rkg, /m/08s7zlw, /m/08s8cd2, /m/08s8hnz, /m/08s8mrk,
/m/08s8v92, /m/08s8tz0, /m/08z097y, /m/08z047g, ... (824674 adds)]} 0 138413

url 

http://localhost:8983/solr/update?commit=true&separator=%09&escape=\&update.chain=threads&backing.chain=logrun&stream.file=/Users/mkhl/Downloads/book_edition.tsv&stream.contentType=text/csv;charset=utf-8

FYI 0.5G heap 
$ java -Xmx512M -Xms512M -jar start.jar

                
      was (Author: mkhludnev):
    Dmitry,

it's a very good question. unfortunately we can choose only two of free, fast, reliable. Contributing
real life code written for customer requires enormous legal efforts. I wrote that one from
scratch.  

Ok. I've took 3M rows tsv from http://www.freebase.com/view/book/book_edition

slightly updated solr 4.0 examples config to allow concurrency (see patch from report.tar.gz)

in report.tar.gz you can see rate of  utilization in iostat outputs. 

summary: 
on MacBookPro core i5
233/183/138 sec for 1/2/4 threads.
3M records, index size is slightly less than 1 G 

single thread (solr as-is)
    KB/t tps  MB/s     KB/t tps  MB/s  us sy id   1m   5m   15m
 1024.00   6  5.99     0.00   0  0.00  36  2 62  2.62 2.16 2.10

233756 millis

Jul 8, 2012 11:41:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] webapp=/solr path=/update params={commit=true&Zupdate.chain=threads&Zbacking.chain=logrun&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv}
{add=[/m/08s9170, /m/08s7myj, /m/08s7nfb, /m/08s912p, /m/08s7nqy, /m/08s7rkg, /m/08s7vmn,
/m/08s7yzd, /m/08s7zlw, /m/08s7zw3, ... (3401073 adds)],commit=} 0 233756 

two threads:

          disk0           disk2       cpu     load average
    KB/t tps  MB/s     KB/t tps  MB/s  us sy id   1m   5m   15m
  104.09 128 13.01     0.00   0  0.00  46  6 48  4.53 2.94 2.30

183157 millis

Jul 8, 2012 11:25:58 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] webapp=/solr path=/update params={backing.chain=logrun&commit=true&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv&update.chain=threads}
{add=[/m/08s7myj, /m/08s7nfb, /m/08s912p, /m/08s7rkg, /m/08s7zlw, /m/08s8127, /m/08s8wx0,
/m/08s8_cg, /m/08s8cd2, /m/08s8wjv, ... (1658583 adds)]} 0 183157
Jul 8, 2012 11:25:58 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] {add=[/m/08s9170, /m/08s7nqy, /m/08s7vmn, /m/08s7yzd, /m/08s7zw3, /m/08s82t3,
/m/08s8dcy, /m/08s8hnz, /m/08s8j3x, /m/08s8mfs, ... (1742490 adds)]} 0 183157

four threads

          disk0           disk2       cpu     load average
    KB/t tps  MB/s     KB/t tps  MB/s  us sy id   1m   5m   15m
   91.19 134 11.91     0.00   0  0.00  93  5  2  5.29 3.13 2.51

138413 millis

Jul 8, 2012 11:53:27 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] webapp=/solr path=/update params={backing.chain=logrun&commit=true&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv&update.chain=threads}
{add=[/m/08s912p, /m/08s7yzd, /m/08s82t3, /m/08s8wjv, /m/08s8nx4, /m/08s8txn, /m/08z05sg,
/m/08z05jm, /m/08yzqg0, /m/08yzkh2, ... (949997 adds)]} 0 138413
Jul 8, 2012 11:53:27 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] {add=[/m/08s9170, /m/08s7nqy, /m/08s7vmn, /m/08s8127, /m/08s8wx0, /m/08s8dcy,
/m/08s8mfs, /m/08s8v_7, /m/08z06nt, /m/08z05c5, ... (848935 adds)]} 0 138413
Jul 8, 2012 11:53:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] {add=[/m/08s7nfb, /m/08s7zw3, /m/08s8_cg, /m/08s8j3x, /m/08s8szc, /m/08z09lt,
/m/08yzf7_, /m/08yz2b1, /m/08yz24n, /m/08yyz1r, ... (777467 adds)]} 0 138413
Jul 8, 2012 11:53:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] {add=[/m/08s7myj, /m/08s7rkg, /m/08s7zlw, /m/08s8cd2, /m/08s8hnz, /m/08s8mrk,
/m/08s8v92, /m/08s8tz0, /m/08z097y, /m/08z047g, ... (824674 adds)]} 0 138413

url 

http://localhost:8983/solr/update?commit=true&separator=%09&escape=\&update.chain=threads&backing.chain=logrun&stream.file=/Users/mkhl/Downloads/book_edition.tsv&stream.contentType=text/csv;charset=utf-8

FYI 0.5G heap 
$ java -Xmx512M -Xms512M -jar start.jar

                  
> processing updates in multiple threads
> --------------------------------------
>
>                 Key: SOLR-3585
>                 URL: https://issues.apache.org/jira/browse/SOLR-3585
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3585.patch, multithreadupd.patch, report.tar.gz
>
>
> Hello,
> I'd like to contribute update processor which forks many threads which concurrently process
the stream of commands. It may be beneficial for users who streams many docs through single
request. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message