Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 94142200BDA for ; Tue, 13 Dec 2016 15:17:44 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 92AEE160B23; Tue, 13 Dec 2016 14:17:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3B72F160B15 for ; Tue, 13 Dec 2016 15:17:43 +0100 (CET) Received: (qmail 59530 invoked by uid 500); 13 Dec 2016 14:17:42 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 59521 invoked by uid 99); 13 Dec 2016 14:17:42 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Dec 2016 14:17:42 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 43DA1E38C2; Tue, 13 Dec 2016 14:17:42 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: mwalch@apache.org To: commits@accumulo.apache.org Message-Id: <32886420e9db497598244f0fc562d8b8@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: accumulo-wikisearch git commit: Updated wikisearch documentation Date: Tue, 13 Dec 2016 14:17:42 +0000 (UTC) archived-at: Tue, 13 Dec 2016 14:17:44 -0000 Repository: accumulo-wikisearch Updated Branches: refs/heads/master 9c30660f6 -> 7fdf1bebb Updated wikisearch documentation * Made documentation use markdown * Combined regular and parellel install instructions * Moved install instructions to INSTALL.md * Pulled in design/performance documentation from website Project: http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/commit/7fdf1beb Tree: http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/tree/7fdf1beb Diff: http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/diff/7fdf1beb Branch: refs/heads/master Commit: 7fdf1bebb2e2b4ca31d58d2d7fc8de8f157a63f3 Parents: 9c30660 Author: Mike Walch Authored: Mon Dec 12 15:26:41 2016 -0500 Committer: Mike Walch Committed: Mon Dec 12 15:51:15 2016 -0500 ---------------------------------------------------------------------- INSTALL.md | 104 ++++++++++++++++++++++++ README | 66 --------------- README.md | 221 +++++++++++++++++++++++++++++++++++++++++++++++++++ README.parallel | 65 --------------- 4 files changed, 325 insertions(+), 131 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/blob/7fdf1beb/INSTALL.md ---------------------------------------------------------------------- diff --git a/INSTALL.md b/INSTALL.md new file mode 100644 index 0000000..fff2bc0 --- /dev/null +++ b/INSTALL.md @@ -0,0 +1,104 @@ + +# Wikisearch Installation + +Instructions for installing and running the Accumulo Wikisearch example. + +## Ingest + +### Prerequisites + +1. Accumulo, Hadoop, and ZooKeeper must be installed and running +1. Download one or more [wikipedia dump files][dump-files] and put them in an HDFS directory. + You will want to grab the files with the link name of pages-articles.xml.bz2. Though not strictly + required, the ingest will go more quickly if the files are decompressed: + + $ bunzip2 < enwiki-*-pages-articles.xml.bz2 | hadoop fs -put - /wikipedia/enwiki-pages-articles.xml + +### Instructions + +1. Create a `wikipedia.xml` file (or `wikipedia_parallel.xml` if running parallel version) from + [wikipedia.xml.example] or [wikipedia_parallel.xml.example] and modify for your Accumulo + installation. + + $ cp ingest/conf + $ cp wikipedia.xml.example wikipedia.xml + $ vim wikipedia.xml + +1. Copy `ingest/lib/wikisearch-*.jar` and `ingest/lib/protobuf*.jar` to `$ACCUMULO_HOME/lib/ext` +1. Run `ingest/bin/ingest.sh` (or `ingest_parallel.sh` if running parallel version) with one + argument (the name of the directory in HDFS where the wikipedia XML files reside) and this will + kick off a MapReduce job to ingest the data into Accumulo. + +## Query + +### Prerequisites + +1. The query software was tested using JBoss AS 6. Install this unless you feel like messing with the installation. + - NOTE: Ran into a [bug] that did not allow an EJB3.1 war file. The workaround is to separate the RESTEasy servlet + from the EJBs by creating an EJB jar and a WAR file. + +### Instructions + +1. Create a `ejb-jar.xml` from [ejb-jar.xml.example] and modify it to contain the same information + that you put into `wikipedia.xml` in the ingest steps above: + + cd query/src/main/resources/META-INF/ + cp ejb-jar.xml.example ejb-jar.xml + vim ejb-jar.xml + +1. Re-build the query distribution by running `mvn package assembly:single` in the query module's directory. +1. Untar the resulting file in the `$JBOSS_HOME/server/default` directory. + + $ cd $JBOSS_HOME/server/default + $ tar -xzf /some/path/to/wikisearch/query/target/wikisearch-query*.tar.gz + + This will place the dependent jars in the lib directory and the EJB jar into the deploy directory. +1. Next, copy the wikisearch*.war file in the query-war/target directory to $JBOSS_HOME/server/default/deploy. +1. Start JBoss ($JBOSS_HOME/bin/run.sh) +1. Use the Accumulo shell and give the user permissions for the wikis that you loaded: + + > setauths -u -s all,enwiki,eswiki,frwiki,fawiki + +1. Copy the following jars to the `$ACCUMULO_HOME/lib/ext` directory from the `$JBOSS_HOME/server/default/lib` directory: + + kryo*.jar + minlog*.jar + commons-jexl*.jar + +1. Copy `$JBOSS_HOME/server/default/deploy/wikisearch-query*.jar` to `$ACCUMULO_HOME/lib/ext.` + +1. At this point you should be able to open a browser and view the page: + + http://localhost:8080/accumulo-wikisearch/ui/ui.jsp + + You can issue the queries using this user interface or via the following REST urls: + + /accumulo-wikisearch/rest/Query/xml + /accumulo-wikisearch/rest/Query/html + /accumulo-wikisearch/rest/Query/yaml + /accumulo-wikisearch/rest/Query/json. + + There are two parameters to the REST service, query and auths. The query parameter is the same string that you would type + into the search box at ui.jsp, and the auths parameter is a comma-separated list of wikis that you want to search (i.e. + enwiki,frwiki,dewiki, etc. Or you can use all) + +[ejb-jar.xml.example]: query/src/main/resources/META-INF/ejb-jar.xml.example +[dump-files]: http://dumps.wikimedia.org/backup-index.html +[wikipedia.xml.example]: ingest/conf/wikipedia.xml.example +[wikipedia_parallel.xml.example]: ingest/conf/wikipedia_parallel.xml.example +[bug]: https://issues.jboss.org/browse/RESTEASY-531 http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/blob/7fdf1beb/README ---------------------------------------------------------------------- diff --git a/README b/README deleted file mode 100644 index ad28cdc..0000000 --- a/README +++ /dev/null @@ -1,66 +0,0 @@ - Apache Accumulo Wikipedia Search Example - - This project contains a sample application for ingesting and querying wikipedia data. - - - Ingest - ------ - - Prerequisites - ------------- - 1. Accumulo, Hadoop, and ZooKeeper must be installed and running - 2. One or more wikipedia dump files (http://dumps.wikimedia.org/backup-index.html) placed in an HDFS directory. - You will want to grab the files with the link name of pages-articles.xml.bz2 - 3. Though not strictly required, the ingest will go more quickly if the files are decompressed: - - $ bunzip2 < enwiki-*-pages-articles.xml.bz2 | hadoop fs -put - /wikipedia/enwiki-pages-articles.xml - - - INSTRUCTIONS - ------------ - 1. Copy the ingest/conf/wikipedia.xml.example to ingest/conf/wikipedia.xml and change it to specify Accumulo information. - 2. Copy the ingest/lib/wikisearch-*.jar and ingest/lib/protobuf*.jar to $ACCUMULO_HOME/lib/ext - 3. Then run ingest/bin/ingest.sh with one argument (the name of the directory in HDFS where the wikipedia XML - files reside) and this will kick off a MapReduce job to ingest the data into Accumulo. - - Query - ----- - - Prerequisites - ------------- - 1. The query software was tested using JBoss AS 6. Install this unless you feel like messing with the installation. - - NOTE: Ran into a bug (https://issues.jboss.org/browse/RESTEASY-531) that did not allow an EJB3.1 war file. The - workaround is to separate the RESTEasy servlet from the EJBs by creating an EJB jar and a WAR file. - - INSTRUCTIONS - ------------- - 1. Copy the query/src/main/resources/META-INF/ejb-jar.xml.example file to - query/src/main/resources/META-INF/ejb-jar.xml. Modify to the file to contain the same - information that you put into the wikipedia.xml file from the Ingest step above. - 2. Re-build the query distribution by running 'mvn package assembly:single' in the query module's directory. - 3. Untar the resulting file in the $JBOSS_HOME/server/default directory. - - $ cd $JBOSS_HOME/server/default - $ tar -xzf /some/path/to/wikisearch/query/target/wikisearch-query*.tar.gz - - This will place the dependent jars in the lib directory and the EJB jar into the deploy directory. - 4. Next, copy the wikisearch*.war file in the query-war/target directory to $JBOSS_HOME/server/default/deploy. - 5. Start JBoss ($JBOSS_HOME/bin/run.sh) - 6. Use the Accumulo shell and give the user permissions for the wikis that you loaded, for example: - setauths -u -s all,enwiki,eswiki,frwiki,fawiki - 7. Copy the following jars to the $ACCUMULO_HOME/lib/ext directory from the $JBOSS_HOME/server/default/lib directory: - - kryo*.jar - minlog*.jar - commons-jexl*.jar - - 8. Copy the $JBOSS_HOME/server/default/deploy/wikisearch-query*.jar to $ACCUMULO_HOME/lib/ext. - - - 9. At this point you should be able to open a browser and view the page: http://localhost:8080/accumulo-wikisearch/ui/ui.jsp. - You can issue the queries using this user interface or via the following REST urls: /accumulo-wikisearch/rest/Query/xml, - /accumulo-wikisearch/rest/Query/html, /accumulo-wikisearch/rest/Query/yaml, or /accumulo-wikisearch/rest/Query/json. - There are two parameters to the REST service, query and auths. The query parameter is the same string that you would type - into the search box at ui.jsp, and the auths parameter is a comma-separated list of wikis that you want to search (i.e. - enwiki,frwiki,dewiki, etc. Or you can use all) http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/blob/7fdf1beb/README.md ---------------------------------------------------------------------- diff --git a/README.md b/README.md new file mode 100644 index 0000000..42289fe --- /dev/null +++ b/README.md @@ -0,0 +1,221 @@ + +# Apache Accumulo Wikisearch + +Wikisearch is an example Accumulo application that provides a flexible, scalable +search over Wikipedia articles. + +## Installation + +Follow the [install instructions][install] to run the example. + +## Design + +The example uses an indexing technique helpful for doing multiple logical tests +against content. In this case, we can perform a word search on Wikipedia +articles. The sample application takes advantage of 3 unique capabilities of +Accumulo: + +1. Extensible iterators that operate within the distributed tablet servers of + the key-value store +1. Custom aggregators which can efficiently condense information during the + various life-cycles of the log-structured merge tree +1. Custom load balancing, which ensures that a table is evenly distributed on + all tablet servers + +In the example, Accumulo tracks the cardinality of all terms as elements are +ingested. If the cardinality is small enough, it will track the set of +documents by term directly. For example: + +| Row (word) | Value (count) | Value (document list) | +|------------|--------------:|:----------------------------| +| Octopus | 2 | [Document 57, Document 220] | +| Other | 172,849 | [] | +| Ostrich | 1 | [Document 901] | + +Searches can be optimized to focus on low-cardinality terms. To create these +counts, the example installs "aggregators" which are used to combine inserted +values. The ingester just writes simple "(Octopus, 1, Document 57)" tuples. +The tablet servers then used the installed aggregators to merge the cells as +the data is re-written, or queried. This reduces the in-memory locking +required to update high-cardinality terms, and defers aggregation to a later +time, where it can be done more efficiently. + +The example also creates a reverse word index to map each word to the document +in which it appears. But it does this by choosing an arbitrary partition for +the document. The article, and the word index for the article are grouped +together into the same partition. For example: + +| Row (partition) | Column Family | Column Qualifier | Value | +|-----------------|---------------|------------------|-----------------| +| 1 | D | Document 57 | "smart Octopus" | +| 1 | Word, Octopus | Document 57 | | +| 1 | Word, smart | Document 57 | | +| ... | | | | +| 2 | D | Document 220 | "big Octopus" | +| 2 | Word, big | Document 220 | | +| 2 | Word, Octopus | Document 220 | | + +Of course, there would be large numbers of documents in each partition, and the +elements of those documents would be interlaced according to their sort order. + +By dividing the index space into partitions, the multi-word searches can be +performed in parallel across all the nodes. Also, by grouping the document +together with its index, a document can be retrieved without a second request +from the client. The query "octopus" and "big" will be performed on all the +servers, but only those partitions for which the low-cardinality term "octopus" +can be found by using the aggregated reverse index information. The query for a +document is performed by extensions provided in the example. These extensions +become part of the tablet server's iterator stack. By cloning the underlying +iterators, the query extensions can seek to specific words within the index, +and when it finds a matching document, it can then seek to the document +location and retrieve the contents. + +## Performance + +The Wikisearch examples was run a on a cluster of 10 servers, each with 12 cores, and 32G +RAM, 6 500G drives. Accumulo tablet servers were allowed a maximum of 3G of +working memory, of which 2G was dedicated to caching file data. + +Following the instructions in the example, the Wikipedia XML data for articles +was loaded for English, Spanish and German languages into 10 partitions. The +data is not partitioned by language: multiple languages were used to get a +larger set of test data. The data load took around 8 hours, and has not been +optimized for scale. Once the data was loaded, the content was compacted which +took about 35 minutes. + +The example uses the language-specific tokenizers available from the Apache +Lucene project for Wikipedia data. + +Original files: + +| Articles | Compressed size | Filename | +|----------|-----------------|----------------------------------------| +| 1.3M | 2.5G | dewiki-20111120-pages-articles.xml.bz2 | +| 3.8M | 7.9G | enwiki-20111115-pages-articles.xml.bz2 | +| 0.8M | 1.4G | eswiki-20111112-pages-articles.xml.bz2 | + +The resulting tables: + + > du -p wiki.* + 47,325,680,634 [wiki] + 5,125,169,305 [wikiIndex] + 413 [wikiMetadata] + 5,521,690,682 [wikiReverseIndex] + +Roughly a 6:1 increase in size. + +We performed the following queries, and repeated the set 5 times. The query +language is much more expressive than what is shown below. The actual query +specified that these words were to be found in the body of the article. Regular +expressions, searches within titles, negative tests, etc are available. + +| Query | Sample 1 (seconds) | Sample 2 (seconds) | Sample 3 (seconds) | Sample 4 (seconds) | Sample 5 (seconds) | Matches | Result Size | +|-----------------------------------------|------|------|------|------|------|--------|-----------| +| "old" and "man" and "sea" | 4.07 | 3.79 | 3.65 | 3.85 | 3.67 | 22,956 | 3,830,102 | +| "paris" and "in" and "the" and "spring" | 3.06 | 3.06 | 2.78 | 3.02 | 2.92 | 10,755 | 1,757,293 | +| "rubber" and "ducky" and "ernie" | 0.08 | 0.08 | 0.1 | 0.11 | 0.1 | 6 | 808 | +| "fast" and ( "furious" or "furriest") | 1.34 | 1.33 | 1.3 | 1.31 | 1.31 | 2,973 | 493,800 | +| "slashdot" and "grok" | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 14 | 2,371 | +| "three" and "little" and "pigs" | 0.92 | 0.91 | 0.9 | 1.08 | 0.88 | 2,742 | 481,531 | + +Because the terms are tested together within the tablet server, even fairly +high-cardinality terms such as "old," "man," and "sea" can be tested +efficiently, without needing to return to the client, or make distributed calls +between servers to perform the intersection between terms. + +For reference, here are the cardinalities for all the terms in the query +(remember, this is across all languages loaded): + +| Term | Cardinality | +|----------|-------------| +| ducky | 795 | +| ernie | 13,433 | +| fast | 166,813 | +| furious | 10,535 | +| furriest | 45 | +| grok | 1,168 | +| in | 1,884,638 | +| little | 320,748 | +| man | 548,238 | +| old | 720,795 | +| paris | 232,464 | +| pigs | 8,356 | +| rubber | 17,235 | +| sea | 247,231 | +| slashdot | 2,343 | +| spring | 125,605 | +| the | 3,509,498 | +| three | 718,810 | + +Accumulo supports caching index information, which is turned on by default, and +for the non-index blocks of a file, which is not. After turning on data block + caching for the wiki table: + +| Query | Sample 1 (seconds) | Sample 2 (seconds) | Sample 3 (seconds) | Sample 4 (seconds) | Sample 5 (seconds) | +|-----------------------------------------|------|------|------|------|------| +| "old" and "man" and "sea" | 2.47 | 2.48 | 2.51 | 2.48 | 2.49 | +| "paris" and "in" and "the" and "spring" | 1.33 | 1.42 | 1.6 | 1.61 | 1.47 | +| "rubber" and "ducky" and "ernie" | 0.07 | 0.08 | 0.07 | 0.07 | 0.07 | +| "fast" and ( "furious" or "furriest") | 1.28 | 0.78 | 0.77 | 0.79 | 0.78 | +| "slashdot" and "grok" | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | +| "three" and "little" and "pigs" | 0.55 | 0.32 | 0.32 | 0.31 | 0.27 | + +For comparison, these are the cold start lookup times (restart Accumulo, and +drop the operating system disk cache): + +| Query | Sample | +|-----------------------------------------|--------| +| "old" and "man" and "sea" | 13.92 | +| "paris" and "in" and "the" and "spring" | 8.46 | +| "rubber" and "ducky" and "ernie" | 2.96 | +| "fast" and ( "furious" or "furriest") | 6.77 | +| "slashdot" and "grok" | 4.06 | +| "three" and "little" and "pigs" | 8.13 | + +### Random Query Load + +Random queries were generated using common english words. A uniform random +sample of 3 to 5 words taken from the 10000 most common words in the Project +Gutenberg's online text collection were joined with "and". Words containing +anything other than letters (such as contractions) were not used. A client was +started simultaneously on each of the 10 servers and each ran 100 random +queries (1000 queries total). + +| Time | Count | +|-------|---------| +| 41.97 | 440,743 | +| 41.61 | 320,522 | +| 42.11 | 347,969 | +| 38.32 | 275,655 | + +### Query Load During Ingest + +The English wikipedia data was re-ingested on top of the existing, compacted +data. The following query samples were taken in 5 minute intervals while +ingesting 132 articles/second: + +| Query | Sample 1 (seconds) | Sample 2 (seconds) | Sample 3 (seconds) | Sample 4 (seconds) | Sample 5 (seconds) | +|-----------------------------------------|------|------|-------|------|-------| +| "old" and "man" and "sea" | 4.91 | 3.92 | 11.58 | 9.86 | 10.21 | +| "paris" and "in" and "the" and "spring" | 5.03 | 3.37 | 12.22 | 3.29 | 9.46 | +| "rubber" and "ducky" and "ernie" | 4.21 | 2.04 | 8.57 | 1.54 | 1.68 | +| "fast" and ( "furious" or "furriest") | 5.84 | 2.83 | 2.56 | 3.12 | 3.09 | +| "slashdot" and "grok" | 5.68 | 2.62 | 2.2 | 2.78 | 2.8 | +| "three" and "little" and "pigs" | 7.82 | 3.42 | 2.79 | 3.29 | 3.3 | + +[install]: INSTALL.md http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/blob/7fdf1beb/README.parallel ---------------------------------------------------------------------- diff --git a/README.parallel b/README.parallel deleted file mode 100644 index 399f0f3..0000000 --- a/README.parallel +++ /dev/null @@ -1,65 +0,0 @@ - Apache Accumulo Wikipedia Search Example (parallel version) - - This project contains a sample application for ingesting and querying wikipedia data. - - - Ingest - ------ - - Prerequisites - ------------- - 1. Accumulo, Hadoop, and ZooKeeper must be installed and running - 2. One or more wikipedia dump files (http://dumps.wikimedia.org/backup-index.html) placed in an HDFS directory. - You will want to grab the files with the link name of pages-articles.xml.bz2 - - - INSTRUCTIONS - ------------ - 1. Copy the ingest/conf/wikipedia_parallel.xml.example to ingest/conf/wikipedia.xml and change it to specify Accumulo information. - 2. Copy the ingest/lib/wikisearch-*.jar and ingest/lib/protobuf*.jar to $ACCUMULO_HOME/lib/ext - 3. Then run ingest/bin/ingest_parallel.sh with one argument (the name of the directory in HDFS where the wikipedia XML - files reside) and this will kick off a MapReduce job to ingest the data into Accumulo. - - Query - ----- - - Prerequisites - ------------- - 1. The query software was tested using JBoss AS 6. Install this unless you feel like messing with the installation. - - NOTE: Ran into a bug (https://issues.jboss.org/browse/RESTEASY-531) that did not allow an EJB3.1 war file. The - workaround is to separate the RESTEasy servlet from the EJBs by creating an EJB jar and a WAR file. - - INSTRUCTIONS - ------------- - 1. Copy the query/src/main/resources/META-INF/ejb-jar.xml.example file to - query/src/main/resources/META-INF/ejb-jar.xml. Modify to the file to contain the same - information that you put into the wikipedia.xml file from the Ingest step above. - 2. Re-build the query distribution by running 'mvn package assembly:single' in the top-level directory. - 3. Untar the resulting file in the $JBOSS_HOME/server/default directory. - - $ cd $JBOSS_HOME/server/default - $ tar -xzf $ACCUMULO_HOME/src/examples/wikisearch/query/target/wikisearch-query*.tar.gz - - This will place the dependent jars in the lib directory and the EJB jar into the deploy directory. - 4. Next, copy the wikisearch*.war file in the query-war/target directory to $JBOSS_HOME/server/default/deploy. - 5. Start JBoss ($JBOSS_HOME/bin/run.sh) - 6. Use the Accumulo shell and give the user permissions for the wikis that you loaded, for example: - setauths -u -s all,enwiki,eswiki,frwiki,fawiki - 7. Copy the following jars to the $ACCUMULO_HOME/lib/ext directory from the $JBOSS_HOME/server/default/lib directory: - - commons-lang*.jar - kryo*.jar - minlog*.jar - commons-jexl*.jar - guava*.jar - - 8. Copy the $JBOSS_HOME/server/default/deploy/wikisearch-query*.jar to $ACCUMULO_HOME/lib/ext. - - - 9. At this point you should be able to open a browser and view the page: http://localhost:8080/accumulo-wikisearch/ui/ui.jsp. - You can issue the queries using this user interface or via the following REST urls: /accumulo-wikisearch/rest/Query/xml, - /accumulo-wikisearch/rest/Query/html, /accumulo-wikisearch/rest/Query/yaml, or /accumulo-wikisearch/rest/Query/json. - There are two parameters to the REST service, query and auths. The query parameter is the same string that you would type - into the search box at ui.jsp, and the auths parameter is a comma-separated list of wikis that you want to search (i.e. - enwiki,frwiki,dewiki, etc. Or you can use all)