lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Chyla <roman.ch...@gmail.com>
Subject Regexp and speed
Date Fri, 30 Nov 2012 17:13:05 GMT
Hi,

Some time ago we have done some measurement of the performance fo the
regexp queries and found that they are VERY FAST! We can't be grateful
enough, it saves many days/lives ;)

This was an old lenovo x61 laptop, core2 due, 1.7GHz,no special memory
allocation, SSD disk:


51459ms.  Buiding index of 100000 docs
181175ms.  Verifying data integrity with 100 docs
315ms.  Preparing 1000 random queries

61167ms.  Regex queries - Stopping execution, # queries finished: 150
2795ms.  Regexp queries (new style)
3936ms.  Wildcard queries
777ms.  Boolean queries
893ms.  Boolean queries (truncated)
3596ms.  Span queries
91751ms.  Span queries (truncated)Stopping execution, # queries finished: 100
3937ms.  Payload queries
93726ms.  Payload queries (truncated)Stopping execution, # queries finished: 100
Totals: [4865, 18284, 18286, 18284, 18405, 287934, 44375, 18284, 2489]

Examples of queries:
--------------------
regex:bgiyodjrr, k\w* michael\w* jay\w* .*
regexp:/bgiyodjrr, k\w* michael\w* jay\w* .*/
wildcard:bgiyodjrr, k*1 michael*2 jay*3 *
+n0:bgiyodjrr +n1:k +n2:michael +n3:jay
+n0:bgiyodjrr +n1:k* +n2:m* +n3:j*
spanNear([vectrfield:bgiyodjrr, vectrfield:k, vectrfield:michael,
vectrfield:jay], 0, true)
spanNear([vectrfield:bgiyodjrr,
SpanMultiTermQueryWrapper(vectrfield:k*),
SpanMultiTermQueryWrapper(vectrfield:m*),
SpanMultiTermQueryWrapper(vectrfield:j*)], 0, true)
spanPayCheck(spanNear([vectrfield:bgiyodjrr, vectrfield:k,
vectrfield:michael, vectrfield:jay], 1, true), payloadRef:
b[0]=48;b[0]=49;b[0]=50;b[0]=51;)
spanPayCheck(spanNear([vectrfield:bgiyodjrr,
SpanMultiTermQueryWrapper(vectrfield:k*),
SpanMultiTermQueryWrapper(vectrfield:m*),
SpanMultiTermQueryWrapper(vectrfield:j*)], 1, true), payloadRef:
b[0]=48;b[0]=49;b[0]=50;b[0]=51;)


The code here:
https://github.com/romanchyla/montysolr/blob/solr-trunk/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java

The benchmark should probably not be called 'benchmark', do you think it
may be too simplistic? Can we expect some bad surprises somewhere?

Thanks,

  roman

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message