lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ishan Chattopadhyaya (JIRA)" <>
Subject [jira] [Commented] (SOLR-10317) Solr Nightly Benchmarks
Date Wed, 29 Mar 2017 16:42:41 GMT


Ishan Chattopadhyaya commented on SOLR-10317:

Here's a rough list of the top of my head. It would be good for a student to add to this list
whatever I've missed out for the sake of completeness:
# Indexing benchmarks
## Standalone
## SolrCloud (various simple configurations (0) )
## new replication mode
# Various types of queries:
## Querying on numeric fields (exact queries, range queries)
## Querying on text fields
## Querying on string fields
## Sorting on numeric fields, string fields (with and without docValues)
## Extended Dismax queries
## Spatial search (using various strategies)
# Query (all the above) on
## Standalone Solr
## SolrCloud (on some simple configurations (0) )
## Also, good if this can be tried out on the new replication mode (SOLR-9835).
# Partial Updates benchmarks (atomic updates, in-place updates)
# Faceting (string fields, numeric fields, enum fields)
# Grouping (string fields, numeric fields, enum fields)
# Spell check

A Wikipedia based dataset is usually available on all the Jenkins instances, and could be
used for the purpose. [~steve_rowe], [~thetaphi], can you please point to the downloadable
link for the enwiki.random.lines.txt file? (I have it, but forgot where I got it from).

If I've missed out something, please feel free to comment.

(0) - Some simple SolrCloud configurations could be:
# 1 shard, 2-3 replicas
# 2 shards, 1 replica each
# 2 shards, 2 replicas each

> Solr Nightly Benchmarks
> -----------------------
>                 Key: SOLR-10317
>                 URL:
>             Project: Solr
>          Issue Type: Task
>            Reporter: Ishan Chattopadhyaya
>              Labels: gsoc2017, mentor
> Solr needs nightly benchmarks reporting. Similar Lucene benchmarks can be found here,
> Preferably, we need:
> # A suite of benchmarks that build Solr from a commit point, start Solr nodes, both in
SolrCloud and standalone mode, and record timing information of various operations like indexing,
querying, faceting, grouping, replication etc.
> # It should be possible to run them either as an independent suite or as a Jenkins job,
and we should be able to report timings as graphs (Jenkins has some charting plugins).
> # The code should eventually be integrated in the Solr codebase, so that it never goes
out of date.
> There is some prior work / discussion:
> # (Shalin)
> # (Ishan/Vivek)
> # SOLR-2646 & SOLR-9863 (Mark Miller)
> # (Mike McCandless)
> # (Tim Potter)
> There is support for building, starting, indexing/querying and stopping Solr in some
of these frameworks above. However, the benchmarks run are very limited. Any of these can
be a starting point, or a new framework can as well be used. The motivation is to be able
to cover every functionality of Solr with a corresponding benchmark that is run every night.
> Proposing this as a GSoC 2017 project. I'm willing to mentor, and I'm sure [~shalinmangar]
and [] would help here.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message