lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "FAQ" by Gabriele
Date Sun, 17 Jul 2011 14:06:56 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "FAQ" page has been changed by Gabriele:
http://wiki.apache.org/solr/FAQ?action=diff&rev1=77&rev2=78

  <<TableOfContents>>
  
  = General =
- 
  == What is Solr? ==
- 
  Solr is a stand alone enterprise search server which applications communicate with using
XML and HTTP to index documents, or execute searches.  Solr supports a rich schema specification
that allows for a wide range of flexibility in dealing with different document fields, and
has an extensive search plugin API for developing custom search behavior.
  
  For more information please read this [[http://lucene.apache.org/solr/features.html|overview
of Solr features]].
  
  == Are there Mailing lists for Solr? ==
- 
- Yes there are several
- [[http://lucene.apache.org/solr/mailing_lists.html|Solr email lists]].
+ Yes there are several [[http://lucene.apache.org/solr/mailing_lists.html|Solr email lists]].
  
+ Here are some guidelines for effectively using the email lists [[UsingMailingLists|Getting
the most out of the email lists]].
- Here are some guidelines for effectively using the email lists
- [[UsingMailingLists|Getting the most out of the email lists]].
  
  == How do you pronounce Solr? ==
- 
  It's pronounced the same as you would pronounce "Solar".
  
  == What does Solr stand for? ==
- 
  Solr is not an acronym.
  
  == Where did Solr come from? ==
- 
- "Solar" (with an A) was initially developed by [[http://cnetnetworks.com|CNET Networks]]
as an in-house search platform beginning in late fall 2004.  By summer 2005, CNET's product
catalog was powered by Solar, and several other CNET applications soon followed.  In January
2006 CNET [[http://issues.apache.org/jira/browse/SOLR-1|Granted the existing code base to
the ASF]] to become the "Solr" project.  On January 17, 2007 Solr [[http://mail-archives.apache.org/mod_mbox/lucene-general/200701.mbox/%3Cc68e39170701170707q3945a14aj5923acb0d3e1f963@mail.gmail.com%3E|graduated
from the Apache Incubator]] to become a Lucene subproject.
+ "Solar" (with an A) was initially developed by [[http://cnetnetworks.com|CNET Networks]]
as an in-house search platform beginning in late fall 2004.  By summer 2005, CNET's product
catalog was powered by Solar, and several other CNET applications soon followed.  In January
2006 CNET [[http://issues.apache.org/jira/browse/SOLR-1|Granted the existing code base to
the ASF]] to become the "Solr" project.  On January 17, 2007 Solr [[http://mail-archives.apache.org/mod_mbox/lucene-general/200701.mbox/<c68e39170701170707q3945a14aj5923acb0d3e1f963@mail.gmail.com>|graduated
from the Apache Incubator]] to become a Lucene subproject. In March 2010, The Solr and Lucene-java
subprojects merged into a single project.
- In March 2010, The Solr and Lucene-java subprojects merged into a single project.
  
  == Is Solr Stable? Is it "Production Quality?" ==
- 
  Solr is currently being used to power search applications on several [[PublicServers|high
traffic publicly accessible websites]].
  
  == Is Solr Schema-less ==
- 
- Yes, in the ways that count.  Solr does have a schema to define types, but it's a "free"
schema in that
- you don't have to define all of your fields ahead of time. Using {{{<dynamicField />}}}
declarations, you can configure field types based on field naming convention, and each document
you index can have a different set of fields.
+ Yes, in the ways that count.  Solr does have a schema to define types, but it's a "free"
schema in that you don't have to define all of your fields ahead of time. Using {{{<dynamicField
/>}}} declarations, you can configure field types based on field naming convention, and
each document you index can have a different set of fields.
  
  = Using =
- 
  == Do my applications have to be written in Java to use Solr? ==
- 
  No.
  
- Solr itself is a Java Application, but all interaction with Solr is done by POSTing messages
over HTTP (in JSON, XML, CSV, or binary formats) to index documents and GETing search results
back as JSON, XML, or a variety of other formats (Python, Ruby, PHP, CSV, binary, etc...)

+ Solr itself is a Java Application, but all interaction with Solr is done by POSTing messages
over HTTP (in JSON, XML, CSV, or binary formats) to index documents and GETing search results
back as JSON, XML, or a variety of other formats (Python, Ruby, PHP, CSV, binary, etc...)
  
  == What are the Requirements for running a Solr server? ==
- 
  Solr requires Java 1.5 and an Application server (such as Tomcat) which supports the Servlet
2.4 standard.
  
  == How can I get started playing with Solr? ==
- 
  There is an [[http://lucene.apache.org/solr/tutorial.html|online tutorial]] as well as a
[[http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/|demonstration configuration
in SVN]].
  
  == Solr Comes with Jetty, is Jetty the recommended Servlet Container to use when running
Solr? ==
+ The Solr example app has Jetty in it just because at the time we set it up, Jetty was the
simplest/smallest servlet container we found that could be run easily in a cross platform
way (ie: "java -jar start.jar").  That does not imply that Solr runs better under Jetty, or
that Jetty is only good enough for demos -- it's just that Jetty made our demo setup easier.
- 
- The Solr example app has Jetty in it just because at the time we set it up, Jetty
- was the simplest/smallest servlet container we found that could be run
- easily in a cross platform way (ie: "java -jar start.jar").  That does not imply
- that Solr runs better under Jetty, or that Jetty is only good enough for demos --
- it's just that Jetty made our demo setup easier.
  
  Users should decide for themselves which Servlet Container they consider the easiest/best
for their use cases based on their needs/experience. For high traffic scenarios, investing
time for tuning the servlet container can often make a big difference.
  
  == How do I change the logging levels/files/format ? ==
- 
  See SolrLogging
  
  == I POSTed some documents, why don't they show up when I search? ==
- 
  Documents that have been added to the index don't show up in search results until a commit
is done (one way is to POST a <commit/> message to the XML update handler). This allows
you to POST many documents in succession and know that none of them will be visible to search
clients until you have finished.
  
  == How can I delete all documents from my index? ==
- 
  Use the "match all docs" query in a delete by query command: {{{<delete><query>*:*</query></delete>}}}
  
  This has been optimized to be more efficient then deleting by some arbitrary query which
matches all docs because of the nature of the data.
  
  == How can I rebuild my index from scratch if I change my schema? ==
- 
   1. Use the "match all docs" query in a delete by query command before shutting down Solr:
{{{<delete><query>*:*</query></delete>}}}
   1. Stop your server
   1. Change your schema.xml
@@ -93, +68 @@

  One can also delete all documents, change the schema.xml file, and then [[CoreAdmin|reload
the core]] w/o shutting down Solr.
  
  == How can I update a specific field of an existing document? ==
- 
  I want update a specific field in a document, is that possible? I only need to index one
field for a specific document. Do I have to index all the document for this?
  
  No, just the one document. Let's say you have a CMS and you edit one document. You will
need to re-index this document only by using the the add solr statement for the whole document
(not one field only).
  
- In Lucene to update a document the operation is really a delete followed by an add.  You
will need to add the complete document as there is no such "update only a field" semantics
in Lucene. 
+ In Lucene to update a document the operation is really a delete followed by an add.  You
will need to add the complete document as there is no such "update only a field" semantics
in Lucene.
  
  == How do I use copyField with wildcards? ==
- 
  The `<copyField>` directive allows wildcards in the source, so that several fields
can be copied into one destination field without having to specify them all individually.
 The dest field may by a full field name, or a wildcard expression. A common use case is something
like:
  
  {{{
     <copyField source="*_t"  dest="text" />
  }}}
- 
  This tells Solr to copy the contents of any field that ends in "_t" to the "text" field.
 This is particularly useful when you have a large, and possibly changing, set of fields you
want to index into a single field.  With the example above, you could start indexing fields
like "description_t", "editorial_review_t", and so on, and all their content would be indexed
in the "text" field.  It's important in this example that the "text" field be defined in schema.xml
as multiValued since you intend to copy multiple sources into the single destination.
  
  Note that you can use the wildcard copyField syntax with or without similar dynamicField
declarations.  Thus you could choose to index the "description_t", "editorial_review_t" fields
individually with a dynamicField like
@@ -115, +87 @@

  {{{
     <dynamicField name="*_t" type="text" indexed="true" stored="false" />
  }}}
- 
  but you don't have to if you don't want to.  You could even mix and match across different
dynamic fields by doing something like
  
  {{{
     <dynamicField name="*_i_t" type="text" indexed="true" stored="false" />
     <copyField source="*_t"  dest="text" />
  }}}
- 
  Now, as you add fields, you can give them names ending in "_i_t" if you want them indexed
seperately, and stored in the main "text" field, and "_t" without the "_i" if you just want
them indexed in "text" but not individually.
  
- 
  == Why does the request time out sometimes when doing commits? ==
- 
  Internally, Solr does nothing to time out any requests -- it lets both updates and queries
take however long they need to take to be processed fully.  However, the servlet container
being used to run Solr may impose arbitrary timeout limits on all requests.  Please consult
the documentation for your Serlvet container if you find that this value is too low.
  
  (In Jetty, the relevant setting is "maxIdleTime" which is in milliseconds)
  
  == Why don't International Characters Work? ==
- 
  Solr can index any characters expressed in the UTF-8 charset (see [[http://issues.apache.org/jira/browse/SOLR-96|SOLR-96]]).
There are no known bugs with Solr's character handling, but there have been some reported
issues with the way different application servers (and different versions of the same application
server) treat incoming and outgoing multibyte characters.  In particular, people have reported
better success with Tomcat than with Jetty...
  
   * "[[http://www.nabble.com/International-Charsets-in-embedded-XML-tf1780147.html#a4897795|International
Charsets in embedded XML]]" (Jetty 5.1)
@@ -142, +109 @@

  If you notice a problem with multibyte characters, the first step to ensure that it is not
a true Solr bug would be to write a unit test that bypasses the application server directly
using the [[http://lucene.apache.org/solr/api/org/apache/solr/util/AbstractSolrTestCase.html|AbstractSolrTestCase]].
  
  The most important points are:
+ 
   * The document has to be indexed as UTF-8 encoded on the solr server. For example, if you
send an ISO encoded document, then the special ISO characters get a byte added (screwing up
the final encoding, only reindexing with UTF-8 can fix this).
-  * The client needs UTF-8 URL encoding when forwarding the search request to the solr server.

+  * The client needs UTF-8 URL encoding when forwarding the search request to the solr server.
   * The server needs to support UTF-8 query strings. See e.g. [[http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config|Solr
with Apache Tomcat]].
  
  If you just forward doing:
- {{{
+ 
- #!java
+ {{{#!java
  String value = request.getParameter("q");
+ }}}
- }}} to get the query string, it can be that q got encoded in ISO and then solr will not
return a search result.
+ to get the query string, it can be that q got encoded in ISO and then solr will not return
a search result.
  
  One possible solution is:
- {{{
+ 
- #!java
+ {{{#!java
  String encoding = request.getCharacterEncoding();
  if (null == encoding) {
-   // Set your default encoding here 
+   // Set your default encoding here
    request.setCharacterEncoding("UTF-8");
  } else {
    request.setCharacterEncoding(encoding);
@@ -165, +134 @@

  ...
  String value = request.getParameter("q");
  }}}
- 
  Another possibility is to use java.net.URLDecoder/URLEncoder to transform all parameter
value to UTF-8.
  
  == Solr started, and i can POST documents to it, but the admin screen doesn't work ==
- 
  The admin screens are implemented using JSPs which require a JDK (instead of just a JRE)
to be compiled on the fly.  If you encounter errors trying to load the admin pages, and the
stack traces of these errors seem to relate to compilation of JSPs, make sure you have a JDK
installed, and make sure it is the instance of java being used.
  
  NOTE: Some Servlet Containers (like Tomcat5.5 and Jetty6) don't require a JDK for JSPs.
@@ -179, +146 @@

  
  Restarting Solr after creating a $(jetty.home)/work directory for Jetty's work files should
solve the problem.
  
- This might also be caused by starting two Solr instances on the same port and killing one,
see [[http://issues.apache.org/jira/browse/SOLR-118#action_12507990|Hoss's comment]] in SOLR-118.

+ This might also be caused by starting two Solr instances on the same port and killing one,
see [[http://issues.apache.org/jira/browse/SOLR-118#action_12507990|Hoss's comment]] in SOLR-118.
  
  == What does "CorruptIndexException: Unknown format version" mean ? ==
- 
- This happens when the Lucene code in Solr used to read the index files from disk encounters
index files in a format it doesn't recognize.  
+ This happens when the Lucene code in Solr used to read the index files from disk encounters
index files in a format it doesn't recognize.
  
  The most common cause is from using a version of Solr+Lucene that is older then the version
used to create that index.
  
  == What does "exceeded limit of maxWarmingSearchers=X" mean? ==
- 
  Whenever a commit happens in Solr, a new "searcher" (with new caches) is opened, "warmed"
up according to your SolrConfigXml settings, and then put in place.  The previous searcher
is not closed until the "warming" search is ready.  If multiple commits happen in rapid succession
-- before the warming searcher from first commit has enough time to warm up, then there can
be multiple searchers all competeing for resources at the same time, even htough one of them
will be thrown away as soon as the next one is ready.
  
  maxWarmingSearchers is a setting in SolrConfigXml that helps you put a safety valve on the
number of overlapping warming searchers that can exist at one time.  If you see this error
it means Solr prevented a commit from resulting an a new searcher being opened because there
were already X warming searchers open.
@@ -198, +163 @@

  If you only encounter this error infrequently because of fluke situations, you'll probably
be ok just ignoring it.
  
  = Searching =
- 
  == How to make the search use AND semantics by default rather than OR? ==
- 
  In `schema.xml`:
+ 
  {{{
  <solrQueryParser defaultOperator="AND"/>
  }}}
- 
  == How do I add full-text summaries to my search results? ==
- 
  Basic highlighting/summarization can be added adding `hl=true` to the query parameters.
 More advanced highlighting is described in HighlightingParameters.
  
  == I have set `hl=true` but no summaries are being output ==
- 
  For a field to be summarizable it must be both stored and indexed.  Note that this can significantly
increase the index size for large fields (e.g. the main content field of a document).  Consider
storing the field using compression (`compressed=true` in the `schema.xml` `fieldType` definition).
 Additionally, such field needs to be tokenized.
  
  == I want to add basic category counts to my search results ==
- 
  Solr provides support for "facets" out-of-the-box.  See SimpleFacetParameters.
  
  == How can I figure out why my documents are being ranked the way they are? ==
- 
  Solr's uses [[http://lucene.apache.org/|Lucene]] for ranking.  A detailed summary of the
ranking calculation can be obtained by adding [[CommonQueryParameters#debugQuery|`debugQuery=true`]]
to the query parameter list.  The output takes some getting used to if you are not familiar
with Lucene's ranking model.
  
  The [[SolrRelevancyFAQ]] has more information on understanding why documents rank the way
they do.
  
  == Why Isn't Sorting Working on my Text Fields? ==
- 
  Lucene Sorting requires that the field you want to sort on be indexed, but it cannot contain
more than one "token" per document.  Most Analyzers used on Text fields result in more than
one token, so the simplest thing to do is to use copyField to index a second version of your
field using the !StrField class.
  
  If you need to do some processing on the field value using !TokenFilters, you can also use
the !KeywordTokenizer, see the Solr example schema for more information.
@@ -238, +196 @@

  See also the Solr tutorial and the xml.com article about Solr, listed in the SolrResources.
  
  == How can I get ALL the matching documents back? ... How can I return an unlimited number
of rows? ==
- 
  This is impractical in most cases.  People typically only want to do this when they know
they are dealing with an index whose size guarantees the result sets will be always be small
enough that they can feasibly be transmitted in a manageable amount -- but if that's the case
just specify what you consider a "manageable amount" as your `rows` param and get the best
of both worlds (all the results when your assumption is right, and a sanity cap on the result
size if it turns out your assumptions are wrong)
  
  == Can I use Lucene to access the index generated by SOLR? ==
- 
  Yes, although this is not recommended. Writing to the index is particularly tricky. However,
if you do go down this route, there are a couple of things to keep in mind. Be careful that
the analysis chain you use in Lucene matches the one used to index the data or you'll get
surprising results. Also, be aware that if you open a searcher, you won't see changes that
Solr makes to the index unless you reopen the underlying readers.
  
+ == Is there a limit on the number of keywords for a Solr query? ==
+ No. If you make a GET query, through [[http://localhost:8080/solr/admin/form.jsp|Solr Web
interface]] for example, you are limited to the maximum URL lenght of the browser.
+ 
+ 
+ 
  = Performance =
- 
  == How fast is indexing? ==
- 
  Indexing performance varies considerably depending on the size of the documents, the analysis
requirements, and cpu and io performance of the machine.  Rates between `10` and `150` docs/s
have been reported.
  
  == How can indexing be accelerated? ==
- 
  A few ideas:
+ 
   * Include multiple documents in a single `<add>` operations.  Note: there is no advantage
in trying to post a huge number of docs in a single go.  I'd suggest going no further than
`10` (full-size docs) to `100` (tiny docs).
   * Ensure you are not performing `<commit/>` until you need to see the updated index.
   * If you are reindexing every document in your index, completely removing the index first
can substantially speed up the required time and disk space.
   * Solr can do some, but not all, parts of indexing in parallel.  Indexing on multiple threads
can be a boon, particularly if you have multiple cpus and your analysis requirements are considerable.
-  * Experiment with different `mergeFactor` and `maxBufferedDocs` settings (see [[http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html]]).
+  * Experiment with different `mergeFactor` and `maxBufferedDocs` settings (see http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html).
  
  == How can I speed up facet counts? ==
- 
  Performance problems can arise when faceting on fields/queries with many unique values.
 If you are faceting on a tokenized field, consider making it untokenized (field class `solr.StrField`,
or using `solr.KeywordTokenizerFactory`).
  
  Also, keep in mind that Solr must construct a filter for every unique value on which you
request faceting.  This only has to be done once, and the results are stored in the `filterCache`.
 If you are experiencing slow faceting, check the cache statistics for the `filterCache` in
the Solr admin.  If there is a large number of cache misses and evictions, try increasing
the capacity of the `filterCache`.
  
  == What does "PERFORMANCE WARNING: Overlapping onDeckSearchers=X" mean in my logs? ==
- 
  This warning means that at least one searcher hadn't yet finished warming in the background,
when a commit was issued and another searcher started warming.  This can not only eat up a
lot of ram (as multiple on deck searches warm caches simultaneously) but it can can create
a feedback cycle, since the more searchers warming in parallel means each searcher might take
longer to warm.
  
  Typically the way to avoid this error is to either reduce the frequency of commits, or reduce
the amount of warming a searcher does while it's on deck (by reducing the work in newSearcher
listeners, and/or reducing the autowarmCount on your caches)
  
  See also the `<maxWarmingSearchers/>` option in SolrConfigXml.
  
- 
  = Developing =
- 
  == Where can I find the latest and Greatest Code? ==
- 
  In the [[http://lucene.apache.org/solr/version_control.html|Solr Version Control Repository]].
  
  == Where can I get the javadocs for the classes? ==
- 
  There are currently [[http://lucene.apache.org/solr/docs/api/|nightly Solr javadocs]]
  
  == How can I help? ==
- 
  Joining and participating in discussion on the [[http://lucene.apache.org/solr/mailing_lists.html|developers
email list]] is the best way to get your feet wet with Solr development.
  
- There is also a TaskList containing all of the ideas people have had about ways to improve
Solr.  Feel free to add your own ideas to this page, or investigate possible implementations
of existing ideas.  When you are ready, [[HowToContribute| submit a patch]] with your changes.
+ There is also a TaskList containing all of the ideas people have had about ways to improve
Solr.  Feel free to add your own ideas to this page, or investigate possible implementations
of existing ideas.  When you are ready, [[HowToContribute|submit a patch]] with your changes.
  
  == How can I submit bug reports, bug fixes or new features? ==
- 
- Bug reports, and [[HowToContribute| patch submissions]] should be entered in [[http://lucene.apache.org/solr/issue_tracking.html|Solr's
Bug Tracking Queue]].
+ Bug reports, and [[HowToContribute|patch submissions]] should be entered in [[http://lucene.apache.org/solr/issue_tracking.html|Solr's
Bug Tracking Queue]].
  
  == How do I apply patches from JIRA issues? ==
- 
- Information about testing patches can be found on the [[HowToContribute#TestingPatches|
How To Contribute]] wiki page
+ Information about testing patches can be found on the [[HowToContribute#TestingPatches|How
To Contribute]] wiki page
  
  == I can't compile Solr, ant says "JUnit not found" or "Could not create task or type of
type: junit" ==
- 
  As of September 21, 2007, JUnit's JAR is now included in Solr's source repository, so there
is no need to install it separately to run Solr's unit tests.  If ant generates a warning
that it doesn't understand the junit task, check that you have an "ant-junit.jar" in your
ANT_LIB directory (it should be included when you install apache-ant).
  
  If you are attempting to compile the Solr source tree from prior to September 21, 2007 (including
[[Solr1.2]]) you will need to include the junit.jar in your ant classpath.  Please see the
[[http://ant.apache.org/manual/OptionalTasks/junit.html|Ant documentation of JUnit]] for notes
about where Ant expects to find the JUnit JAR and Ant task JARs.
  
  == How can I start the example application in Debug mode? ==
- 
- You can start the example application in debug mode to debug your java class with your favorite
IDE (like eclipse). 
+ You can start the example application in debug mode to debug your java class with your favorite
IDE (like eclipse).
+ 
  {{{
  java -Xdebug -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n -jar start.jar
  }}}
  Then connect to port 8000 and debug.
  
  == Tagging using SOLR ==
- There is a wiki page on some brainstorming on how to implement  
+ There is a wiki page on some brainstorming on how to implement   tagging within Solr [UserTagDesign].
- tagging within Solr [UserTagDesign].
  

Mime
View raw message