<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>solr-user@lucene.apache.org Archives</title>
<link rel="self" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/?format=atom"/>
<link href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/"/>
<id>http://mail-archives.apache.org/mod_mbox/lucene-solr-user/</id>
<updated>2009-12-06T02:28:10Z</updated>
<entry>
<title>Re: WELCOME to solr-user@lucene.apache.org</title>
<author><name>khalid y &lt;kernity@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c301a0bc90912051444o280f5d68h561701aa80b01d4@mail.gmail.com%3e"/>
<id>urn:uuid:%3c301a0bc90912051444o280f5d68h561701aa80b01d4@mail-gmail-com%3e</id>
<updated>2009-12-05T22:44:16Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Thanks a lot for you response !!

For the first solution :

I need to index all the content of my websites and I want just tika ignore
&lt;meta name="id"&gt; because I have already an id
I'll try monday and tell you if it works

The second solution :
Are your sure Tika use the HTML Tokenizer ? I'll check

2009/12/5 Raghuveer Kancherla &lt;raghuveer.kancherla@aplopio.com&gt;

&gt; 2 ways I can think of ...
&gt;
&gt;   - ExtractingRequestHandler (this is what I am guessing you are using now)
&gt;
&gt; Set extractOnly=true while making a request to the extractingRequestHandler
&gt; and get the parsed content back. Now make a post request on update request
&gt; handler with what ever fields and field values you want.
&gt;



&gt;
&gt;   - Use HTMLStripWhiteSpaceTokenizer factory. This article may be helpful
&gt;   to explain what I mean.
&gt;
&gt; http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripWhitespaceTokenizerFactory
&gt; .
&gt;
&gt;
&gt;
&gt; - Raghu
&gt;
&gt;
&gt;
&gt; On Sat, Dec 5, 2009 at 3:44 AM, khalid y &lt;kernity@gmail.com&gt; wrote:
&gt;
&gt; &gt; Hi,
&gt; &gt;
&gt; &gt; I have a problem with solr. I'm indexing some html content and solr crash
&gt; &gt; because my id field is multivalued.
&gt; &gt; I found that Tika read the html and extract metadata like &lt;meta name="id"
&gt; &gt; content="12"&gt; from my htmls but my documents has an already an id setted
&gt; by
&gt; &gt; literal.id=10.
&gt; &gt;
&gt; &gt; I tried to map the id from Tika by fmap.id=ignored_ but it ignore also
&gt; my
&gt; &gt; literal.id
&gt; &gt;
&gt; &gt; I'm using solr 1.4 and tika 0.5
&gt; &gt;
&gt; &gt; Someone can explain to me how I can ignore this the Tika id metadata ??
&gt; &gt;
&gt; &gt; Thanks
&gt; &gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Document Decay</title>
<author><name>Grant Ingersoll &lt;gsingers@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c2D9F1116-3A15-4F98-9D24-86A05CA1B0FE@apache.org%3e"/>
<id>urn:uuid:%3c2D9F1116-3A15-4F98-9D24-86A05CA1B0FE@apache-org%3e</id>
<updated>2009-12-05T22:33:09Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

On Dec 4, 2009, at 1:56 AM, brad anderson wrote:

&gt; Hi,
&gt; 
&gt; I'm looking for a way to have the score of documents decay over time. I want
&gt; older documents to have a lower score than newer documents.
&gt; 
&gt; I noted the ReciprocalFloatFunction class. In an example it seemed to be
&gt; doing just this when you set the function to be:
&gt; 
&gt;     recip(ms(NOW,mydatefield),3.16e-11,1,1)
&gt; 
&gt; This is supposed to degrade the score to half its value if the mydatefield
&gt; is 1 year older  than the current date. My question with this is, is it
&gt; making the document score go down to 0.5 or is it making the document score
&gt; 1/2 of its original value.
&gt; 
&gt; i.e.
&gt; 
&gt; The document has score 0.8
&gt; 
&gt; Will the score be 0.4 or 0.5 after using this function?


Actually, the value of the function gets added (it can be multiplied, too, with other params)
to the score for the document.  You can see this by adding a &amp;debugQuery=true value to
your request which allows you to examine the explains.

&gt; 
&gt; Also, are there better alternatives to deal with document decay?

Some people like a different decay that does something like:  today is better than yesterday,
yesterday is better than last week and last week is better than last month, etc. (in a non-linear
way).  Todo this, you would need to implement your own function, I think.

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: WELCOME to solr-user@lucene.apache.org</title>
<author><name>Raghuveer Kancherla &lt;raghuveer.kancherla@aplopio.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c6451b6e90912050921h4b91ec9erb2cbdb2c2a9c7214@mail.gmail.com%3e"/>
<id>urn:uuid:%3c6451b6e90912050921h4b91ec9erb2cbdb2c2a9c7214@mail-gmail-com%3e</id>
<updated>2009-12-05T17:21:08Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
2 ways I can think of ...

   - ExtractingRequestHandler (this is what I am guessing you are using now)

Set extractOnly=true while making a request to the extractingRequestHandler
and get the parsed content back. Now make a post request on update request
handler with what ever fields and field values you want.


   - Use HTMLStripWhiteSpaceTokenizer factory. This article may be helpful
   to explain what I mean.
   http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripWhitespaceTokenizerFactory.



- Raghu



On Sat, Dec 5, 2009 at 3:44 AM, khalid y &lt;kernity@gmail.com&gt; wrote:

&gt; Hi,
&gt;
&gt; I have a problem with solr. I'm indexing some html content and solr crash
&gt; because my id field is multivalued.
&gt; I found that Tika read the html and extract metadata like &lt;meta name="id"
&gt; content="12"&gt; from my htmls but my documents has an already an id setted by
&gt; literal.id=10.
&gt;
&gt; I tried to map the id from Tika by fmap.id=ignored_ but it ignore also my
&gt; literal.id
&gt;
&gt; I'm using solr 1.4 and tika 0.5
&gt;
&gt; Someone can explain to me how I can ignore this the Tika id metadata ??
&gt;
&gt; Thanks
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Retrieving large num of docs</title>
<author><name>Raghuveer Kancherla &lt;raghuveer.kancherla@aplopio.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c6451b6e90912050905w706bca9ci1ad5b1987334d289@mail.gmail.com%3e"/>
<id>urn:uuid:%3c6451b6e90912050905w706bca9ci1ad5b1987334d289@mail-gmail-com%3e</id>
<updated>2009-12-05T17:05:49Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi Otis,
I think my experiments are not conclusive about reduction in search time. I
was playing around with various configurations to reduce the time to
retrieve documents from Solr. I am sure that making the two multi valued
text fields from stored to un-stored, retrieval time (query time + time to
load the stored fields) became very fast. I was expecting the
lazyfieldloading setting in solrconfig to take care of this but apparently
it is not working as expected.

Out of curiosity, I removed these 2 fields from the index (this time I am
not even indexing them) and my search time got better (10 times better).
However, I am still trying to isolate the reason for the search time
reduction. It may be either because of 2 less fields to search in or because
of the reduction in size of the index or may be something else. I am not
sure if lazyfieldloading has any part in explaining this.

- Raghu



On Fri, Dec 4, 2009 at 3:07 AM, Otis Gospodnetic &lt;otis_gospodnetic@yahoo.com
&gt; wrote:

&gt; Hm, hm, interesting.  I was looking into something like this the other day
&gt; (BIG indexed+stored text fields).  After seeing enableLazyFieldLoading=true
&gt; in solrconfig and after seeing "fl" didn't include those big fields, I
&gt; though "hm, so Lucene/Solr will not be pulling those large fields from disk,
&gt; OK".
&gt;
&gt; You are saying that this may not be true based on your experiment?
&gt; And what I'm calling your "experiment" means that you reindexed the same
&gt; data, but without the 2 multi-valued text fields... .and that was the only
&gt; change you made and got cca x10 search performance improvement?
&gt;
&gt; Sorry for repeating your words, just trying to confirm and understand.
&gt;
&gt; Thanks,
&gt; Otis
&gt; --
&gt; Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
&gt;
&gt;
&gt;
&gt; ----- Original Message ----
&gt; &gt; From: Raghuveer Kancherla &lt;raghuveer.kancherla@aplopio.com&gt;
&gt; &gt; To: solr-user@lucene.apache.org
&gt; &gt; Sent: Thu, December 3, 2009 8:43:16 AM
&gt; &gt; Subject: Re: Retrieving large num of docs
&gt; &gt;
&gt; &gt; Hi Hoss,
&gt; &gt;
&gt; &gt; I was experimenting with various queries to solve this problem and in one
&gt; &gt; such test I remember that requesting only the ID did not change the
&gt; &gt; retrieval time. To be sure, I tested it again using the curl command
&gt; today
&gt; &gt; and it confirms my previous observation.
&gt; &gt;
&gt; &gt; Also, enableLazyFieldLoading setting is set to true in my solrconfig.
&gt; &gt;
&gt; &gt; Another general observation (off topic) is that having a moderately large
&gt; &gt; multi valued text field (~200 entries) in the index seems to slow down
&gt; the
&gt; &gt; search significantly. I removed the 2 multi valued text fields from my
&gt; index
&gt; &gt; and my search got ~10 time faster. :)
&gt; &gt;
&gt; &gt; - Raghu
&gt; &gt;
&gt; &gt;
&gt; &gt; On Thu, Dec 3, 2009 at 2:14 AM, Chris Hostetter wrote:
&gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; : I think I solved the problem of retrieving 300 docs per request for
&gt; now.
&gt; &gt; &gt; The
&gt; &gt; &gt; : problem was that I was storing 2 moderately large multivalued text
&gt; fields
&gt; &gt; &gt; : though I was not retrieving them during search time.  I reindexed all
&gt; my
&gt; &gt; &gt; : data without storing these fields. Now the response time (time for
&gt; Solr
&gt; &gt; &gt; to
&gt; &gt; &gt; : return the http response) is very close to the QTime Solr is showing
&gt; in
&gt; &gt; &gt; the
&gt; &gt; &gt;
&gt; &gt; &gt; Hmmm....
&gt; &gt; &gt;
&gt; &gt; &gt; two comments:
&gt; &gt; &gt;
&gt; &gt; &gt; 1) the example URL from your previous mail...
&gt; &gt; &gt;
&gt; &gt; &gt; : &gt;
&gt; &gt; &gt;
&gt; &gt;
&gt; http://localhost:1212/solr/select/?rows=300&amp;q=%28ResumeAllText%3A%28%28%28%22java+j2ee%22+%28java+j2ee%29%29%29%5E4%29%5E1.0%29&amp;start=0&amp;wt=python
&gt; &gt; &gt;
&gt; &gt; &gt; ...doesn't match your earlier statemnet that you are only returning hte
&gt; id
&gt; &gt; &gt; field (there is no "fl" param in that URL) ... are you certain you
&gt; werent'
&gt; &gt; &gt; returning those large stored fields in teh response?
&gt; &gt; &gt;
&gt; &gt; &gt; 2) assuming you were actually using an fl param to limit the fields,
&gt; make
&gt; &gt; &gt; sure you have this setting in your solrconfig.xml...
&gt; &gt; &gt;
&gt; &gt; &gt;    true
&gt; &gt; &gt;
&gt; &gt; &gt; ..that should make it pretty fast to return only a few fields of each
&gt; &gt; &gt; document, even if you do have some jumpto stored fields that aren't
&gt; being
&gt; &gt; &gt; returned.
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; -Hoss
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: HTML Stripping slower in Solr 1.4?</title>
<author><name>Koji Sekiguchi &lt;koji@r.email.ne.jp&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c4B1A8463.7030302@r.email.ne.jp%3e"/>
<id>urn:uuid:%3c4B1A8463-7030302@r-email-ne-jp%3e</id>
<updated>2009-12-05T16:03:47Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Yonik Seeley wrote:
&gt; Is BaseCharFilter required for the html strip filter?
&gt;
&gt; -Yonik
&gt; http://www.lucidimagination.com
&gt;
&gt;   
It could be if HTMLStripCharFilter is reverted to first version.
The first version of HTMLStripCharFilter, for example,
if we have "&lt;p&gt;aaa", it produces "   aaa" (3 space chars prior
to aaa). But after committed SOLR-1394, it produces " aaa"
(1 space) and now it uses correct() method of BaseCharFilter
to correct offsets.

Koji

-- 
http://www.rondhuit.com/en/



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: solr 1.4: multi-select for statscomponent</title>
<author><name>gunjan_versata &lt;gunjangarg1@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c26656565.post@talk.nabble.com%3e"/>
<id>urn:uuid:%3c26656565-post@talk-nabble-com%3e</id>
<updated>2009-12-05T15:28:24Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

Is there any update on this requirement??


Britske wrote:
&gt; 
&gt; Is there way to exclude filters from a stats field, like it is possible to
&gt; exclude filters from a facet.field? It didn't work for me. 
&gt; 
&gt; i.e: I have a field price, and although I filter on price, I would like to
&gt; be able to get the entire range (min,max) of prices as if I didn't specify
&gt; the filter. Obviously without excluding the filter the min,max range is
&gt; constrained by [50,100]
&gt; 
&gt; Part of query: 
&gt; stats=true&amp;stats.field={!ex=p1}price&amp;fq={!tag=p1}price:[50 TO 100]
&gt; 
&gt; USE-CASE:
&gt; I show a double-slider using javascript to display possible prices. (2
&gt; handles, one allowing to set min-price and the other to set max-price) 
&gt; The slider has a range of [0,maxprice without price filter set]. maxprice
&gt; is inserted by getting info from 'stats.price&amp;stats=true'
&gt; 
&gt; When the user sets the slider a filter (fq) is set constraining the
&gt; resultset the set min and max-prices. 
&gt; After the page updates, I still want to show the price-slider, with the
&gt; min and max handles set to the prices as selected by the user, so the user
&gt; can alter his filter quickly.
&gt; 
&gt; However (and here it comes) I would also be able to get the 'maxprice
&gt; without price filter set' because I need this to set the max-range of the
&gt; slider. 
&gt; 
&gt; Is there any (undocumented) feature that makes this possible? If not,
&gt; would it be easy to add?
&gt; 
&gt; Thanks, 
&gt; Britske
&gt; 
&gt; 

-- 
View this message in context: http://old.nabble.com/solr-1.4%3A-multi-select-for-statscomponent-tp22202971p26656565.html
Sent from the Solr - User mailing list archive at Nabble.com.



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Solr 1.4: StringIndexOutOfBoundsException in SpellCheckComponent with 	HTMLStripCharFilterFactory</title>
<author><name>Koji Sekiguchi &lt;koji@r.email.ne.jp&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c4B1A79F4.8080805@r.email.ne.jp%3e"/>
<id>urn:uuid:%3c4B1A79F4-8080805@r-email-ne-jp%3e</id>
<updated>2009-12-05T15:19:16Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Robin Wojciki wrote:
&gt; I am running a search in Solr 1.4 and I am getting the
&gt; StringIndexOutOfBoundsException pasted below. The spell check field
&gt; uses HTMLStripCharFilterFactory. However, the search works fine if I
&gt; do not use the HTMLStripCharFilterFactory.
&gt;
&gt; If I set a breakpoint at SpellCheckComponent.java: 248, the value of
&gt; the variable "best" is as shown in the screenshot:
&gt; http://yfrog.com/j5solrdebuginspectp
&gt;
&gt; At the end of first iteration, offset = 5 - (24 - 0) = -19
&gt; This causes the index out of bounds exception.
&gt;
&gt; The spell check field is defined as:
&gt;
&gt;         &lt;fieldType name="text_spell" class="solr.TextField"
&gt; positionIncrementGap="100" &gt;
&gt;             &lt;analyzer&gt;
&gt;                 &lt;charFilter class="solr.HTMLStripCharFilterFactory"/&gt;
&gt;                 &lt;tokenizer class="solr.StandardTokenizerFactory"/&gt;
&gt;                 &lt;filter class="solr.StandardFilterFactory"/&gt;
&gt;                 &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
&gt;                 &lt;filter class="solr.StopFilterFactory"
&gt; ignoreCase="true" words="stopwords.txt"
&gt; enablePositionIncrements="true"/&gt;
&gt;                 &lt;filter class="solr.SynonymFilterFactory"
&gt; synonyms="synonyms.txt" ignoreCase="true" expand="true"/&gt;
&gt;                 &lt;filter class="solr.RemoveDuplicatesTokenFilterFactory"/&gt;
&gt;             &lt;/analyzer&gt;
&gt;         &lt;/fieldType&gt;
&gt;
&gt;
&gt;
&gt; Stack Trace:
&gt; =========
&gt; String index out of range: -19
&gt;
&gt; java.lang.StringIndexOutOfBoundsException: String index out of range: -19
&gt; 	at java.lang.AbstractStringBuilder.replace(Unknown Source)
&gt; 	at java.lang.StringBuilder.replace(Unknown Source)
&gt; 	at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
&gt; 	at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
&gt; 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
&gt; 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
&gt; 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
&gt; 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
&gt; 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
&gt; 	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
&gt; 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
&gt; 	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
&gt; 	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
&gt; 	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
&gt; 	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
&gt; 	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
&gt; 	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
&gt; 	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
&gt; 	at org.mortbay.jetty.Server.handle(Server.java:285)
&gt; 	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
&gt; 	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
&gt; 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
&gt; 	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
&gt; 	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
&gt; 	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
&gt; 	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
&gt;
&gt;   
I couldn't reproduce it with simple test data.
Can you open a JIRA and attach a test case that reproduces
the problem with spellchecker definition in solrconfig.xml.

Koji

-- 
http://www.rondhuit.com/en/



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Query time boosting with dismax</title>
<author><name>Uri Boness &lt;uboness@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c4B1A75E0.6030309@gmail.com%3e"/>
<id>urn:uuid:%3c4B1A75E0-6030309@gmail-com%3e</id>
<updated>2009-12-05T15:01:52Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Checking it further by looking at the code, it seems that in most cases 
it indeed adds the boost queries as SHOULD. But if you define *one* bq 
parameter which contains a boolean query, then each clause in this 
boolean query will be added to the query as is. Therefore:

This set up will filter the query:
&lt;str name="bq"&gt;
        +category:Audio +name:black
&lt;/str&gt;

This set up will *not* filter the query:
&lt;str name="bq"&gt;
        +category:Audio
&lt;/str&gt;
&lt;str name="bq"&gt;
        +name:black
&lt;/str&gt;

So, in the first set up, the default operator as defined in the schema 
plays a role.

Cheers,
Uri

Erik Hatcher wrote:
&gt; Are you sure about the default operator and bq?  I assume we're 
&gt; talking about the setting in schema.xml.
&gt;
&gt; I think boosting queries are OR'd in automatically to the main query:
&gt;
&gt; From DismaxQParser#addBoostQuery()
&gt;   ... query.add(f, BooleanClause.Occur.SHOULD);...
&gt;
&gt; There is one case where query.add((BooleanClause) c); is used though.
&gt;
&gt;     Erik
&gt;
&gt;
&gt; On Dec 5, 2009, at 6:54 AM, Uri Boness wrote:
&gt;
&gt;&gt; You can actually define boost queries to do that (bq parameter). 
&gt;&gt; Boost queries accept the standard Lucene query syntax and eventually 
&gt;&gt; appended to the user query. Just make sure that the default operator 
&gt;&gt; is set to OR other wise these boost queries will not only influence 
&gt;&gt; the boosts but also filter out some of the results.
&gt;&gt;
&gt;&gt; Otis Gospodnetic wrote:
&gt;&gt;&gt; Terms no, but fields (with terms) and phrases, yes.
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; Otis
&gt;&gt;&gt; -- 
&gt;&gt;&gt; Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; ----- Original Message ----
&gt;&gt;&gt;
&gt;&gt;&gt;&gt; From: Girish Redekar &lt;girish.redekar@aplopio.com&gt;
&gt;&gt;&gt;&gt; To: solr-user@lucene.apache.org
&gt;&gt;&gt;&gt; Sent: Fri, December 4, 2009 11:42:16 PM
&gt;&gt;&gt;&gt; Subject: Query time boosting with dismax
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Hi,
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Is it possible to weigh specific query terms with a Dismax query 
&gt;&gt;&gt;&gt; parser? Is
&gt;&gt;&gt;&gt; it possible to write queries of the sort ...
&gt;&gt;&gt;&gt; field1:(term1)^2.0 + (term2^3.0)
&gt;&gt;&gt;&gt; with dismax?
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Thanks,
&gt;&gt;&gt;&gt; Girish Redekar
&gt;&gt;&gt;&gt; http://girishredekar.net
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Query time boosting with dismax</title>
<author><name>Uri Boness &lt;uboness@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c4B1A7006.7020400@gmail.com%3e"/>
<id>urn:uuid:%3c4B1A7006-7020400@gmail-com%3e</id>
<updated>2009-12-05T14:36:54Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Well.. this is mainly based on some experiments I did (not based on the 
code base). It appeared as if the boost queries were appended to the 
generated dismax query and if the default operator (in the schema) was 
set to AND it actually filtered out the request. For example, here's a 
dismax config:

&lt;requestHandler name="dismax" class="solr.SearchHandler" default="true"&gt;
    &lt;lst name="defaults"&gt;
     &lt;str name="defType"&gt;dismax&lt;/str&gt;
     &lt;str name="qf"&gt;
        text^0.5 name^1.0 category^1.2
     &lt;/str&gt;
     &lt;str name="bq"&gt;
        *category:Audio name:black*
     &lt;/str&gt;
     &lt;str name="fl"&gt;
        *,score
     &lt;/str&gt;
     ...
  &lt;/requestHandler&gt;

When searching with a default OR operator, you receive more results than 
with an AND operator. Checking out the generated query using 
debugQuery=true reviles the following:

Generated query with default OR operator:
+DisjunctionMaxQuery((category:black^1.2 | text:black^0.5 | 
name:black)~0.01) DisjunctionMaxQuery((category:black^1.5 | 
text:black^0.5 | name:black^1.2)~0.01) *category:Audio name:black* 
FunctionQuery((product(sint(rating),const(-1.0)))^0.5)

Generated query with default AND operator:
+DisjunctionMaxQuery((category:black^1.2 | text:black^0.5 | 
name:black)~0.01) DisjunctionMaxQuery((category:black^1.5 | 
text:black^0.5 | name:black^1.2)~0.01) *+category:Audio +name:black* 
FunctionQuery((product(sint(rating),const(-1.0)))^0.5)

So when it's an AND, both clauses are marked as MUST in the overall 
query, which in turn filters the query. Indeed, I would expect it to add 
these queries as SHOULD and then the generated query would look like:
+DisjunctionMaxQuery((category:black^1.2 | text:black^0.5 | 
name:black)~0.01) DisjunctionMaxQuery((category:black^1.5 | 
text:black^0.5 | name:black^1.2)~0.01) (*+category:Audio +name:black*) 
FunctionQuery((product(sint(rating),const(-1.0)))^0.5)

Cheers,
Uri

Erik Hatcher wrote:
&gt; Are you sure about the default operator and bq?  I assume we're 
&gt; talking about the setting in schema.xml.
&gt;
&gt; I think boosting queries are OR'd in automatically to the main query:
&gt;
&gt; From DismaxQParser#addBoostQuery()
&gt;   ... query.add(f, BooleanClause.Occur.SHOULD);...
&gt;
&gt; There is one case where query.add((BooleanClause) c); is used though.
&gt;
&gt;     Erik
&gt;
&gt;
&gt; On Dec 5, 2009, at 6:54 AM, Uri Boness wrote:
&gt;
&gt;&gt; You can actually define boost queries to do that (bq parameter). 
&gt;&gt; Boost queries accept the standard Lucene query syntax and eventually 
&gt;&gt; appended to the user query. Just make sure that the default operator 
&gt;&gt; is set to OR other wise these boost queries will not only influence 
&gt;&gt; the boosts but also filter out some of the results.
&gt;&gt;
&gt;&gt; Otis Gospodnetic wrote:
&gt;&gt;&gt; Terms no, but fields (with terms) and phrases, yes.
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; Otis
&gt;&gt;&gt; -- 
&gt;&gt;&gt; Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; ----- Original Message ----
&gt;&gt;&gt;
&gt;&gt;&gt;&gt; From: Girish Redekar &lt;girish.redekar@aplopio.com&gt;
&gt;&gt;&gt;&gt; To: solr-user@lucene.apache.org
&gt;&gt;&gt;&gt; Sent: Fri, December 4, 2009 11:42:16 PM
&gt;&gt;&gt;&gt; Subject: Query time boosting with dismax
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Hi,
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Is it possible to weigh specific query terms with a Dismax query 
&gt;&gt;&gt;&gt; parser? Is
&gt;&gt;&gt;&gt; it possible to write queries of the sort ...
&gt;&gt;&gt;&gt; field1:(term1)^2.0 + (term2^3.0)
&gt;&gt;&gt;&gt; with dismax?
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Thanks,
&gt;&gt;&gt;&gt; Girish Redekar
&gt;&gt;&gt;&gt; http://girishredekar.net
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Query time boosting with dismax</title>
<author><name>Erik Hatcher &lt;erik.hatcher@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3cD5181FCA-C68A-433B-B4D6-B27A2BE684D6@gmail.com%3e"/>
<id>urn:uuid:%3cD5181FCA-C68A-433B-B4D6-B27A2BE684D6@gmail-com%3e</id>
<updated>2009-12-05T14:12:10Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Are you sure about the default operator and bq?  I assume we're  
talking about the setting in schema.xml.

I think boosting queries are OR'd in automatically to the main query:

 From DismaxQParser#addBoostQuery()
   ... query.add(f, BooleanClause.Occur.SHOULD);...

There is one case where query.add((BooleanClause) c); is used though.

	Erik


On Dec 5, 2009, at 6:54 AM, Uri Boness wrote:

&gt; You can actually define boost queries to do that (bq parameter).  
&gt; Boost queries accept the standard Lucene query syntax and eventually  
&gt; appended to the user query. Just make sure that the default operator  
&gt; is set to OR other wise these boost queries will not only influence  
&gt; the boosts but also filter out some of the results.
&gt;
&gt; Otis Gospodnetic wrote:
&gt;&gt; Terms no, but fields (with terms) and phrases, yes.
&gt;&gt;
&gt;&gt;
&gt;&gt; Otis
&gt;&gt; --
&gt;&gt; Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt; ----- Original Message ----
&gt;&gt;
&gt;&gt;&gt; From: Girish Redekar &lt;girish.redekar@aplopio.com&gt;
&gt;&gt;&gt; To: solr-user@lucene.apache.org
&gt;&gt;&gt; Sent: Fri, December 4, 2009 11:42:16 PM
&gt;&gt;&gt; Subject: Query time boosting with dismax
&gt;&gt;&gt;
&gt;&gt;&gt; Hi,
&gt;&gt;&gt;
&gt;&gt;&gt; Is it possible to weigh specific query terms with a Dismax query  
&gt;&gt;&gt; parser? Is
&gt;&gt;&gt; it possible to write queries of the sort ...
&gt;&gt;&gt; field1:(term1)^2.0 + (term2^3.0)
&gt;&gt;&gt; with dismax?
&gt;&gt;&gt;
&gt;&gt;&gt; Thanks,
&gt;&gt;&gt; Girish Redekar
&gt;&gt;&gt; http://girishredekar.net
&gt;&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt;



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Sanity check on numeric types and which of them to use</title>
<author><name>Yonik Seeley &lt;yonik@lucidimagination.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3cc68e39170912050545t426a43fbu36848ddde78ee473@mail.gmail.com%3e"/>
<id>urn:uuid:%3cc68e39170912050545t426a43fbu36848ddde78ee473@mail-gmail-com%3e</id>
<updated>2009-12-05T13:45:06Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On Sat, Dec 5, 2009 at 7:02 AM, Marc Sturlese &lt;marc.sturlese@gmail.com&gt; wrote:
&gt;
&gt; And what about:
&gt; &lt;fieldtype name="sint" class="solr.SortableIntField"
&gt; sortMissingLast="true"/&gt;
&gt; vs.
&gt; &lt;fieldtype name="bcdint" class="solr.BCDIntField" sortMissingLast="true"/&gt;
&gt;
&gt; Wich is the differenece between both? It's just bcdint always better?
&gt; Thanks in advance

BCDInt was a very early attempt at a sortable int type that didnt go
through binary - it went directly from base 10 (the actual string
representation) to a sortable base 10000 (10K fits in a single char
and saves memory in the fieldCache), and it also had no size limit.
It's no longer referenced in any example schemas, and it doesn't have
support for function queries.

-Yonik
http://www.lucidimagination.com


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Sanity check on numeric types and which of them to use</title>
<author><name>Marc Sturlese &lt;marc.sturlese@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c26655009.post@talk.nabble.com%3e"/>
<id>urn:uuid:%3c26655009-post@talk-nabble-com%3e</id>
<updated>2009-12-05T12:02:58Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

And what about:
&lt;fieldtype name="sint" class="solr.SortableIntField"
sortMissingLast="true"/&gt;
vs.
&lt;fieldtype name="bcdint" class="solr.BCDIntField" sortMissingLast="true"/&gt;

Wich is the differenece between both? It's just bcdint always better?
Thanks in advance


Yonik Seeley-2 wrote:
&gt; 
&gt; On Fri, Dec 4, 2009 at 7:38 PM, Jay Hill &lt;jayallenhill@gmail.com&gt; wrote:
&gt;&gt; 1) Is there any benefit to using the "int" type as a TrieIntField w/
&gt;&gt; precisionStep=0 over the "pint" type for simple ints that won't be sorted
&gt;&gt; or
&gt;&gt; range queried?
&gt; 
&gt; No.  But given that people could throw in a random range query and
&gt; have it work correctly with a trie based int (vs a plain int), seems
&gt; reason enough to prefer it.
&gt; 
&gt;&gt; 2) In 1.4, what type is now most efficient for sorting?
&gt; 
&gt; trie and plain should be pretty equivalent (trie might be slightly
&gt; faster to uninvert the first time).  Both take up less memory in the
&gt; field cache than sint.
&gt; 
&gt;&gt; 3) The only reason to use a "sint" field is for backward compatibility
&gt;&gt; and/or to use sortMissingFirst/SortMissingLast, correct?
&gt; 
&gt; I believe so.
&gt; 
&gt; -Yonik
&gt; http://www.lucidimagination.com
&gt; 
&gt; 

-- 
View this message in context: http://old.nabble.com/Sanity-check-on-numeric-types-and-which-of-them-to-use-tp26651725p26655009.html
Sent from the Solr - User mailing list archive at Nabble.com.



</pre>
</div>
</content>
</entry>
<entry>
<title>Embedded for write, HTTP for read - cache aging</title>
<author><name>Peter 4U &lt;peter4u@hotmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3cCOL119-W52EAD3AEC18278BA80A6AEF8920@phx.gbl%3e"/>
<id>urn:uuid:%3cCOL119-W52EAD3AEC18278BA80A6AEF8920@phx-gbl%3e</id>
<updated>2009-12-05T11:56:46Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

Hello,

 

Does anyone know of a way to tell an http SolrServer to reload its back-end index (mark cache
as dirty) periodically?

 

I have a scenario where an EmbeddedSolrServer is used for writing (for fast indexing), and
an

CommonsHttpSolrServer for reading (for remote access).

 

If the http server is used for writing, reading clients pick up any updates, as the /update
has gone 'through' the http server.

For very high indexing rates, I'd rather not have to build an http request for every write
(or group of writes), since the writer is always on the same machine as the index.

 

Any help on this is much appreciated.

 

Thanks,

Peter

 
 		 	   		  
_________________________________________________________________
View your other email accounts from your Hotmail inbox. Add them now.
http://clk.atdmt.com/UKM/go/186394592/direct/01/

</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Query time boosting with dismax</title>
<author><name>Uri Boness &lt;uboness@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c4B1A49D9.7020508@gmail.com%3e"/>
<id>urn:uuid:%3c4B1A49D9-7020508@gmail-com%3e</id>
<updated>2009-12-05T11:54:01Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
You can actually define boost queries to do that (bq parameter). Boost 
queries accept the standard Lucene query syntax and eventually appended 
to the user query. Just make sure that the default operator is set to OR 
other wise these boost queries will not only influence the boosts but 
also filter out some of the results.

Otis Gospodnetic wrote:
&gt; Terms no, but fields (with terms) and phrases, yes.
&gt;
&gt;
&gt; Otis
&gt; --
&gt; Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
&gt;
&gt;
&gt;
&gt; ----- Original Message ----
&gt;   
&gt;&gt; From: Girish Redekar &lt;girish.redekar@aplopio.com&gt;
&gt;&gt; To: solr-user@lucene.apache.org
&gt;&gt; Sent: Fri, December 4, 2009 11:42:16 PM
&gt;&gt; Subject: Query time boosting with dismax
&gt;&gt;
&gt;&gt; Hi,
&gt;&gt;
&gt;&gt; Is it possible to weigh specific query terms with a Dismax query parser? Is
&gt;&gt; it possible to write queries of the sort ...
&gt;&gt; field1:(term1)^2.0 + (term2^3.0)
&gt;&gt; with dismax?
&gt;&gt;
&gt;&gt; Thanks,
&gt;&gt; Girish Redekar
&gt;&gt; http://girishredekar.net
&gt;&gt;     
&gt;
&gt;
&gt;   


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Query time boosting with dismax</title>
<author><name>Otis Gospodnetic &lt;otis_gospodnetic@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c965119.46070.qm@web50303.mail.re2.yahoo.com%3e"/>
<id>urn:uuid:%3c965119-46070-qm@web50303-mail-re2-yahoo-com%3e</id>
<updated>2009-12-05T07:37:01Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Terms no, but fields (with terms) and phrases, yes.


Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
&gt; From: Girish Redekar &lt;girish.redekar@aplopio.com&gt;
&gt; To: solr-user@lucene.apache.org
&gt; Sent: Fri, December 4, 2009 11:42:16 PM
&gt; Subject: Query time boosting with dismax
&gt; 
&gt; Hi,
&gt; 
&gt; Is it possible to weigh specific query terms with a Dismax query parser? Is
&gt; it possible to write queries of the sort ...
&gt; field1:(term1)^2.0 + (term2^3.0)
&gt; with dismax?
&gt; 
&gt; Thanks,
&gt; Girish Redekar
&gt; http://girishredekar.net



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Debian Lenny + Apache Tomcat 5.5 + Solr 1.4</title>
<author><name>rajan chandi &lt;chandi.rajan@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3ca70bf0e20912042233r139e47dfkd50c980b67fd6487@mail.gmail.com%3e"/>
<id>urn:uuid:%3ca70bf0e20912042233r139e47dfkd50c980b67fd6487@mail-gmail-com%3e</id>
<updated>2009-12-05T06:33:35Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Local Solr doesn't look like 64 bit.

rajan@rajan-desktop:~$ uname -a
Linux rajan-desktop 2.6.28-16-server #55-Ubuntu SMP Tue Oct 20 20:50:00 UTC
2009 i686 GNU/Linux


But the Xen Solr server does

rajan@rajan-desktop:~$ uname -a
Linux rajan-desktop 2.6.28-16-server #55-Ubuntu SMP Tue Oct 20 20:50:00 UTC
2009 i686 GNU/Linux


May be that is the reason why Server is taking more RAM.

Thanks all for your responses.

Regards
Rajan

On Sat, Dec 5, 2009 at 11:06 AM, rajan chandi &lt;chandi.rajan@gmail.com&gt;wrote:

&gt; My local ubuntu 9.04 64 bit taking 1.5 GB is not a VM and Debian Lenny 64
&gt; bit taking 2 GB is a Xen Instance.
&gt;
&gt; - Rajan
&gt;
&gt;
&gt; On Sat, Dec 5, 2009 at 10:51 AM, rajan chandi &lt;chandi.rajan@gmail.com&gt;wrote:
&gt;
&gt;&gt; We are using 64 bit VM with 64 bit JDK on it.
&gt;&gt; It is 2.00 GB RAM Zen instance.
&gt;&gt;
&gt;&gt; We're setting up max JVM heap size of 1800 MB max.
&gt;&gt;
&gt;&gt; - Rajan
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;&gt; On Fri, Dec 4, 2009 at 8:19 PM, Yonik Seeley &lt;yonik@lucidimagination.com&gt;wrote:
&gt;&gt;
&gt;&gt;&gt; Are you explicitly setting the heap sizes?  If not, the JVM is
&gt;&gt;&gt; deciding for itself based on what the box looks like (ram, cpus, OS,
&gt;&gt;&gt; etc).  Are they both the same architecture (32 bit or 64 bit?)
&gt;&gt;&gt;
&gt;&gt;&gt; -Yonik
&gt;&gt;&gt; http://www.lucidimagination.com
&gt;&gt;&gt;
&gt;&gt;&gt; p.s. in general cross-posting to both solr-user and solr-dev is
&gt;&gt;&gt; discouraged.
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; On Fri, Dec 4, 2009 at 5:27 AM, rajan chandi &lt;chandi.rajan@gmail.com&gt;
&gt;&gt;&gt; wrote:
&gt;&gt;&gt; &gt; Hi All,
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; We've deployed 4 instances of Solr on a debian server.
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; It is taking only 1.5 GB of RAM on local ubuntu machine but it is
&gt;&gt;&gt; taking 2.0
&gt;&gt;&gt; &gt; GB plus on Debian Lenny server.
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; Any ideas/pointers will help.
&gt;&gt;&gt; &gt;
&gt;&gt;&gt; &gt; Regards
&gt;&gt;&gt; &gt; Rajan
&gt;&gt;&gt; &gt;
&gt;&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Debian Lenny + Apache Tomcat 5.5 + Solr 1.4</title>
<author><name>rajan chandi &lt;chandi.rajan@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3ca70bf0e20912042136h47c291a3kd32954c243b7999f@mail.gmail.com%3e"/>
<id>urn:uuid:%3ca70bf0e20912042136h47c291a3kd32954c243b7999f@mail-gmail-com%3e</id>
<updated>2009-12-05T05:36:46Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
My local ubuntu 9.04 64 bit taking 1.5 GB is not a VM and Debian Lenny 64
bit taking 2 GB is a Xen Instance.

- Rajan

On Sat, Dec 5, 2009 at 10:51 AM, rajan chandi &lt;chandi.rajan@gmail.com&gt;wrote:

&gt; We are using 64 bit VM with 64 bit JDK on it.
&gt; It is 2.00 GB RAM Zen instance.
&gt;
&gt; We're setting up max JVM heap size of 1800 MB max.
&gt;
&gt; - Rajan
&gt;
&gt;
&gt;
&gt; On Fri, Dec 4, 2009 at 8:19 PM, Yonik Seeley &lt;yonik@lucidimagination.com&gt;wrote:
&gt;
&gt;&gt; Are you explicitly setting the heap sizes?  If not, the JVM is
&gt;&gt; deciding for itself based on what the box looks like (ram, cpus, OS,
&gt;&gt; etc).  Are they both the same architecture (32 bit or 64 bit?)
&gt;&gt;
&gt;&gt; -Yonik
&gt;&gt; http://www.lucidimagination.com
&gt;&gt;
&gt;&gt; p.s. in general cross-posting to both solr-user and solr-dev is
&gt;&gt; discouraged.
&gt;&gt;
&gt;&gt;
&gt;&gt; On Fri, Dec 4, 2009 at 5:27 AM, rajan chandi &lt;chandi.rajan@gmail.com&gt;
&gt;&gt; wrote:
&gt;&gt; &gt; Hi All,
&gt;&gt; &gt;
&gt;&gt; &gt; We've deployed 4 instances of Solr on a debian server.
&gt;&gt; &gt;
&gt;&gt; &gt; It is taking only 1.5 GB of RAM on local ubuntu machine but it is taking
&gt;&gt; 2.0
&gt;&gt; &gt; GB plus on Debian Lenny server.
&gt;&gt; &gt;
&gt;&gt; &gt; Any ideas/pointers will help.
&gt;&gt; &gt;
&gt;&gt; &gt; Regards
&gt;&gt; &gt; Rajan
&gt;&gt; &gt;
&gt;&gt;
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Debian Lenny + Apache Tomcat 5.5 + Solr 1.4</title>
<author><name>rajan chandi &lt;chandi.rajan@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3ca70bf0e20912042121x60541da0vba92817aac28f55b@mail.gmail.com%3e"/>
<id>urn:uuid:%3ca70bf0e20912042121x60541da0vba92817aac28f55b@mail-gmail-com%3e</id>
<updated>2009-12-05T05:21:11Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
We are using 64 bit VM with 64 bit JDK on it.
It is 2.00 GB RAM Zen instance.

We're setting up max JVM heap size of 1800 MB max.

- Rajan


On Fri, Dec 4, 2009 at 8:19 PM, Yonik Seeley &lt;yonik@lucidimagination.com&gt;wrote:

&gt; Are you explicitly setting the heap sizes?  If not, the JVM is
&gt; deciding for itself based on what the box looks like (ram, cpus, OS,
&gt; etc).  Are they both the same architecture (32 bit or 64 bit?)
&gt;
&gt; -Yonik
&gt; http://www.lucidimagination.com
&gt;
&gt; p.s. in general cross-posting to both solr-user and solr-dev is
&gt; discouraged.
&gt;
&gt;
&gt; On Fri, Dec 4, 2009 at 5:27 AM, rajan chandi &lt;chandi.rajan@gmail.com&gt;
&gt; wrote:
&gt; &gt; Hi All,
&gt; &gt;
&gt; &gt; We've deployed 4 instances of Solr on a debian server.
&gt; &gt;
&gt; &gt; It is taking only 1.5 GB of RAM on local ubuntu machine but it is taking
&gt; 2.0
&gt; &gt; GB plus on Debian Lenny server.
&gt; &gt;
&gt; &gt; Any ideas/pointers will help.
&gt; &gt;
&gt; &gt; Regards
&gt; &gt; Rajan
&gt; &gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Query time boosting with dismax</title>
<author><name>Girish Redekar &lt;girish.redekar@aplopio.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c73ce86df0912042042p7a84379btf3064e38b4651ef8@mail.gmail.com%3e"/>
<id>urn:uuid:%3c73ce86df0912042042p7a84379btf3064e38b4651ef8@mail-gmail-com%3e</id>
<updated>2009-12-05T04:42:16Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,

Is it possible to weigh specific query terms with a Dismax query parser? Is
it possible to write queries of the sort ...
field1:(term1)^2.0 + (term2^3.0)
with dismax?

Thanks,
Girish Redekar
http://girishredekar.net


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Sanity check on numeric types and which of them to use</title>
<author><name>Yonik Seeley &lt;yonik@lucidimagination.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3cc68e39170912041730w66666f6btecf233f1d0cb3f50@mail.gmail.com%3e"/>
<id>urn:uuid:%3cc68e39170912041730w66666f6btecf233f1d0cb3f50@mail-gmail-com%3e</id>
<updated>2009-12-05T01:30:08Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On Fri, Dec 4, 2009 at 7:38 PM, Jay Hill &lt;jayallenhill@gmail.com&gt; wrote:
&gt; 1) Is there any benefit to using the "int" type as a TrieIntField w/
&gt; precisionStep=0 over the "pint" type for simple ints that won't be sorted or
&gt; range queried?

No.  But given that people could throw in a random range query and
have it work correctly with a trie based int (vs a plain int), seems
reason enough to prefer it.

&gt; 2) In 1.4, what type is now most efficient for sorting?

trie and plain should be pretty equivalent (trie might be slightly
faster to uninvert the first time).  Both take up less memory in the
field cache than sint.

&gt; 3) The only reason to use a "sint" field is for backward compatibility
&gt; and/or to use sortMissingFirst/SortMissingLast, correct?

I believe so.

-Yonik
http://www.lucidimagination.com


</pre>
</div>
</content>
</entry>
<entry>
<title>Sanity check on numeric types and which of them to use</title>
<author><name>Jay Hill &lt;jayallenhill@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3cd0e6561a0912041638j1ce04246pe14ea2ac8878d391@mail.gmail.com%3e"/>
<id>urn:uuid:%3cd0e6561a0912041638j1ce04246pe14ea2ac8878d391@mail-gmail-com%3e</id>
<updated>2009-12-05T00:38:53Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Looking at the example version of schema.xml there seems to be some
confusion on which numeric field types are best used in different
situations. What confused me was that the type of "int" is now set to a
TrieIntField, but with a precisionStep of 0:
    &lt;fieldType name="int" class="solr.TrieIntField" precisionStep="0"
omitNorms="true" positionIncrementGap="0"/&gt;'
the "tint" type is set up as a TrieIntField with a precisionStep of 8:
    &lt;fieldType name="tint" class="solr.TrieIntField" precisionStep="8"
omitNorms="true" positionIncrementGap="0"/&gt;
the "sint" type is unchanged:
    &lt;fieldType name="sint" class="solr.SortableIntField"
sortMissingLast="true" omitNorms="true"/&gt;
and the old IntField is now of type "pint":
    &lt;fieldType name="pint" class="solr.IntField" omitNorms="true"/&gt;

It's obvious that the "tint" type would be preferred for range queries. But
these questions come to mind:
1) Is there any benefit to using the "int" type as a TrieIntField w/
precisionStep=0 over the "pint" type for simple ints that won't be sorted or
range queried?
2) In 1.4, what type is now most efficient for sorting?
3) The only reason to use a "sint" field is for backward compatibility
and/or to use sortMissingFirst/SortMissingLast, correct?

-Jay


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: how to set multiple fq while building a query in solrj</title>
<author><name>Erik Hatcher &lt;erik.hatcher@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c09619F06-CD4E-4924-B520-EB88BD735669@gmail.com%3e"/>
<id>urn:uuid:%3c09619F06-CD4E-4924-B520-EB88BD735669@gmail-com%3e</id>
<updated>2009-12-04T23:10:39Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

On Dec 4, 2009, at 4:21 PM, javaxmlsoapdev wrote:

&gt;
&gt; how do I create a query string witih multiple fq params using solrj  
&gt; SolrQuery
&gt; API.
&gt;
&gt; e.g. I want to build a query as follow
&gt;
&gt; http://servername:port/solr/issues/select/?q=testing&amp;fq=statusName: 
&gt; (Female
&gt; OR Male)&amp;fq=name="Joe"
&gt;
&gt; I am using solrj client APIs to build query and using SolrQuery as  
&gt; follow
&gt;
&gt; solrQuery.setParam("fq" statusString);
&gt; solrQuery.setParam("fq", nameString);
&gt;
&gt; It only sets last "fq" (fq=nameString)in the string.. If I swich  
&gt; abover
&gt; setParam order it sets fq=statusString. How do I set muliple fq  
&gt; params in
&gt; SolrQuery object.

Use SolrQuery#add() instead.  Or SolrQuery#addFilterQuery()

	Erik



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: WELCOME to solr-user@lucene.apache.org</title>
<author><name>khalid y &lt;kernity@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c301a0bc90912041414g44e7c99av85eeeefa111dede4@mail.gmail.com%3e"/>
<id>urn:uuid:%3c301a0bc90912041414g44e7c99av85eeeefa111dede4@mail-gmail-com%3e</id>
<updated>2009-12-04T22:14:05Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,

I have a problem with solr. I'm indexing some html content and solr crash
because my id field is multivalued.
I found that Tika read the html and extract metadata like &lt;meta name="id"
content="12"&gt; from my htmls but my documents has an already an id setted by
literal.id=10.

I tried to map the id from Tika by fmap.id=ignored_ but it ignore also my
literal.id

I'm using solr 1.4 and tika 0.5

Someone can explain to me how I can ignore this the Tika id metadata ??

Thanks


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Dumping solr requests for indexing</title>
<author><name>Mark Miller &lt;markrmiller@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c4B198235.9020000@gmail.com%3e"/>
<id>urn:uuid:%3c4B198235-9020000@gmail-com%3e</id>
<updated>2009-12-04T21:42:13Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Teruhiko Kurosaka wrote:
&gt;&gt; Aha!
&gt;&gt; Sounds like a job for a simple, custom 
&gt;&gt; UpdateRequestProcessor.  Actually, I think URP doesn't get 
&gt;&gt; access to the actual XML, but what it has access may be 
&gt;&gt; enough for you: http://wiki.apache.org/solr/UpdateRequestProcessor
&gt;&gt;     
&gt;
&gt; I added this to solrconfig.xml but I don't see any extra output 
&gt; in the log file.
&gt;
&gt;   &lt;updateRequestProcessorChain&gt;
&gt;     &lt;processor class="solr.LogUpdateProcessorFactory" /&gt;
&gt;     &lt;processor class="solr.RunUpdateProcessorFactory" /&gt;
&gt;   &lt;/updateRequestProcessorChain&gt;
&gt;
&gt; Do I need to do something else to make this effective?
&gt;
&gt; The commented-out example in solrconfic.xml has 'name="dedupe"'
&gt; attribute. Do I have to specify a name? If so, how do I use 
&gt; that name in the request?
&gt;
&gt; Kuro
&gt;   
Look at the comment above the dedupe declaration:

       You have to link the chain to an update handler above to use it ie:
         &lt;requestHandler name="/update
"class="solr.XmlUpdateRequestHandler"&gt;
           &lt;lst name="defaults"&gt;
             &lt;str name="update.processor"&gt;dedupe&lt;/str&gt;
           &lt;/lst&gt;
         &lt;/requestHandler&gt; 

-- 
- Mark

http://www.lucidimagination.com





</pre>
</div>
</content>
</entry>
<entry>
<title>RE: Dumping solr requests for indexing</title>
<author><name>Teruhiko Kurosaka &lt;Kuro@basistech.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3cCAE8639A6FAB3D4AA7973D5897B325185F2ACF19F4@MSG-BOX.basistech.net%3e"/>
<id>urn:uuid:%3cCAE8639A6FAB3D4AA7973D5897B325185F2ACF19F4@MSG-BOX-basistech-net%3e</id>
<updated>2009-12-04T21:36:48Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

&gt; Aha!
&gt; Sounds like a job for a simple, custom 
&gt; UpdateRequestProcessor.  Actually, I think URP doesn't get 
&gt; access to the actual XML, but what it has access may be 
&gt; enough for you: http://wiki.apache.org/solr/UpdateRequestProcessor

I added this to solrconfig.xml but I don't see any extra output 
in the log file.

  &lt;updateRequestProcessorChain&gt;
    &lt;processor class="solr.LogUpdateProcessorFactory" /&gt;
    &lt;processor class="solr.RunUpdateProcessorFactory" /&gt;
  &lt;/updateRequestProcessorChain&gt;

Do I need to do something else to make this effective?

The commented-out example in solrconfic.xml has 'name="dedupe"'
attribute. Do I have to specify a name? If so, how do I use 
that name in the request?

Kuro


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Debian Lenny + Apache Tomcat 5.5 + Solr 1.4</title>
<author><name>Kay Kay &lt;kaykay.unique@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c4B197FBD.5070007@gmail.com%3e"/>
<id>urn:uuid:%3c4B197FBD-5070007@gmail-com%3e</id>
<updated>2009-12-04T21:31:41Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
What are the nature of the machines / VM run on ?  32-bit / 64-bit ?

rajan chandi wrote:
&gt; Hi All,
&gt;
&gt; We've deployed 4 instances of Solr on a debian server.
&gt;
&gt; It is taking only 1.5 GB of RAM on local ubuntu machine but it is taking 2.0
&gt; GB plus on Debian Lenny server.
&gt;
&gt; Any ideas/pointers will help.
&gt;
&gt; Regards
&gt; Rajan
&gt;
&gt;   



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: HTML Stripping slower in Solr 1.4?</title>
<author><name>Yonik Seeley &lt;yonik@lucidimagination.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3cc68e39170912041324t667710dbub6f092aa18a872d5@mail.gmail.com%3e"/>
<id>urn:uuid:%3cc68e39170912041324t667710dbub6f092aa18a872d5@mail-gmail-com%3e</id>
<updated>2009-12-04T21:24:58Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Is BaseCharFilter required for the html strip filter?

-Yonik
http://www.lucidimagination.com


On Tue, Dec 1, 2009 at 1:17 AM, Koji Sekiguchi &lt;koji@r.email.ne.jp&gt; wrote:
&gt; Robin,
&gt;
&gt; Thank you for reporting this. Performance degradation of HTML Stripper
&gt; could be in 1.4. I opened a ticket in Lucene:
&gt;
&gt; https://issues.apache.org/jira/browse/LUCENE-2098
&gt;
&gt; Koji
&gt;
&gt; --
&gt; http://www.rondhuit.com/en/
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: how is score computed with hsin functionquery?</title>
<author><name>gdeconto &lt;gerald.deconto@topproducer.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c26638720.post@talk.nabble.com%3e"/>
<id>urn:uuid:%3c26638720-post@talk-nabble-com%3e</id>
<updated>2009-12-04T21:24:30Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

Thanks Lance, I appreciate your response.  

I know what a DIH is and have already written custom transformers.  I just
misunderstood your response to my message (I wasnt aware that we could use
JS to create transformers).

Anyhow, my intent is to change the tool (create a variation of hsin to
support degrees) rather than change the data (which introduces other issues,
such as having to support most systems in degrees and this one system in
radians)

any ideas/advice in that regard?
-- 
View this message in context: http://old.nabble.com/how-is-score-computed-with-hsin-functionquery--tp26504265p26638720.html
Sent from the Solr - User mailing list archive at Nabble.com.



</pre>
</div>
</content>
</entry>
<entry>
<title>how to set multiple fq while building a query in solrj</title>
<author><name>javaxmlsoapdev &lt;vikasdp@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c26638650.post@talk.nabble.com%3e"/>
<id>urn:uuid:%3c26638650-post@talk-nabble-com%3e</id>
<updated>2009-12-04T21:21:52Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

how do I create a query string witih multiple fq params using solrj SolrQuery
API.

e.g. I want to build a query as follow

http://servername:port/solr/issues/select/?q=testing&amp;fq=statusName:(Female
OR Male)&amp;fq=name="Joe"

I am using solrj client APIs to build query and using SolrQuery as follow

solrQuery.setParam("fq" statusString);
solrQuery.setParam("fq", nameString);

It only sets last "fq" (fq=nameString)in the string.. If I swich abover
setParam order it sets fq=statusString. How do I set muliple fq params in
SolrQuery object.

Thanks,
-- 
View this message in context: http://old.nabble.com/how-to-set-multiple-fq-while-building-a-query-in-solrj-tp26638650p26638650.html
Sent from the Solr - User mailing list archive at Nabble.com.



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Dumping solr requests for indexing</title>
<author><name>Otis Gospodnetic &lt;otis_gospodnetic@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c183202.23514.qm@web50307.mail.re2.yahoo.com%3e"/>
<id>urn:uuid:%3c183202-23514-qm@web50307-mail-re2-yahoo-com%3e</id>
<updated>2009-12-04T20:16:51Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Aha!
Sounds like a job for a simple, custom UpdateRequestProcessor.  Actually, I think URP doesn't
get access to the actual XML, but what it has access may be enough for you: http://wiki.apache.org/solr/UpdateRequestProcessor

Alternatively, unpack the war, add a custom logging servlet filter, chain it in web.xml and
that might do the trick.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
&gt; From: Teruhiko Kurosaka &lt;Kuro@basistech.com&gt;
&gt; To: "solr-user@lucene.apache.org" &lt;solr-user@lucene.apache.org&gt;
&gt; Sent: Fri, December 4, 2009 3:05:57 PM
&gt; Subject: RE: Dumping solr requests for indexing
&gt; 
&gt; Log only tells high-level descriptions of what were done.
&gt; I'd like to capture the exact XML requests with data, so that
&gt; I could re-feed it to Solr to reproduce the issue my
&gt; customer is encountering.
&gt; 
&gt; -kuro  
&gt; 
&gt; &gt; -----Original Message-----
&gt; &gt; From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
&gt; &gt; Sent: Friday, December 04, 2009 11:41 AM
&gt; &gt; To: solr-user@lucene.apache.org
&gt; &gt; Subject: Re: Dumping solr requests for indexing
&gt; &gt; 
&gt; &gt; The solr log, as well as the servlet container log should 
&gt; &gt; have them all.
&gt; &gt; 
&gt; &gt; Otis
&gt; &gt; --
&gt; &gt; Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
&gt; &gt; 
&gt; &gt; 
&gt; &gt; 
&gt; &gt; ----- Original Message ----
&gt; &gt; &gt; From: Teruhiko Kurosaka 
&gt; &gt; &gt; To: "solr-user@lucene.apache.org" 
&gt; &gt; &gt; Sent: Fri, December 4, 2009 2:23:17 PM
&gt; &gt; &gt; Subject: Dumping solr requests for indexing
&gt; &gt; &gt; 
&gt; &gt; &gt; Is there any way to dump all incoming requests to Solr into a file?
&gt; &gt; &gt; 
&gt; &gt; &gt; My customer is seeing a strange problem of disappearing docs from 
&gt; &gt; &gt; index and I'd like to ask them to capture all incoming requests.
&gt; &gt; &gt; 
&gt; &gt; &gt; Thanks.
&gt; &gt; &gt; 
&gt; &gt; &gt; -kuro
&gt; &gt; 
&gt; &gt; 



</pre>
</div>
</content>
</entry>
<entry>
<title>Answer: RE: Question: Write to Solr but not via http, and still store date_format</title>
<author><name>Peter 4U &lt;peter4u@hotmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3cCOL119-W1936BE044B7B89A08792D7F8930@phx.gbl%3e"/>
<id>urn:uuid:%3cCOL119-W1936BE044B7B89A08792D7F8930@phx-gbl%3e</id>
<updated>2009-12-04T20:14:06Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

Oops, of course the answer was staring me in the face!

   --&gt; Use the EmbeddedSolrServer, rather than the CommonsHttpSolrServer.

 

Live and learn. Live. and learn.

 

Thanks,

Peter

 


 
&gt; From: peter4u@hotmail.com
&gt; To: solr-user@lucene.apache.org
&gt; Subject: Question: Write to Solr but not via http, and still store date_format
&gt; Date: Fri, 4 Dec 2009 20:09:19 +0000
&gt; 
&gt; 
&gt; Hi Solr team,
&gt; 
&gt; 
&gt; 
&gt; Has anyone been able to write to Solr, keeping things like 'date_format', but indexing
directly, rather than via http?
&gt; 
&gt; 
&gt; 
&gt; I've been indexing using Lucene Java, and this works well and is very fast, except that
any data indexed this way doesn't store date_format et al information (date.format resuts
always return 0).
&gt; 
&gt; I like indexing directly into Lucene, rather than via http requests, as it is much faster,
particularly at very high input rates.
&gt; 
&gt; 
&gt; 
&gt; Anyone encountered this and managed to solve it?
&gt; 
&gt; 
&gt; 
&gt; Many thanks,
&gt; 
&gt; peter
&gt; 
&gt; 
&gt; 
&gt; _________________________________________________________________
&gt; Got more than one Hotmail account? Save time by linking them together
&gt; http://clk.atdmt.com/UKM/go/186394591/direct/01/
 		 	   		  
_________________________________________________________________
Got more than one Hotmail account? Save time by linking them together
 http://clk.atdmt.com/UKM/go/186394591/direct/01/

</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Question: Write to Solr but not via http, and still store date_format</title>
<author><name>Otis Gospodnetic &lt;otis_gospodnetic@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c686698.29254.qm@web50304.mail.re2.yahoo.com%3e"/>
<id>urn:uuid:%3c686698-29254-qm@web50304-mail-re2-yahoo-com%3e</id>
<updated>2009-12-04T20:13:55Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Are you looking for http://wiki.apache.org/solr/EmbeddedSolr ?

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
&gt; From: Peter 4U &lt;peter4u@hotmail.com&gt;
&gt; To: Solr &lt;solr-user@lucene.apache.org&gt;
&gt; Sent: Fri, December 4, 2009 3:09:19 PM
&gt; Subject: Question: Write to Solr but not via http, and still store date_format
&gt; 
&gt; 
&gt; Hi Solr team,
&gt; 
&gt; 
&gt; 
&gt; Has anyone been able to write to Solr, keeping things like 'date_format', but 
&gt; indexing directly, rather than via http?
&gt; 
&gt; 
&gt; 
&gt; I've been indexing using Lucene Java, and this works well and is very fast, 
&gt; except that any data indexed this way doesn't store date_format et al 
&gt; information (date.format resuts always return 0).
&gt; 
&gt; I like indexing directly into Lucene, rather than via http requests, as it is 
&gt; much faster, particularly at very high input rates.
&gt; 
&gt; 
&gt; 
&gt; Anyone encountered this and managed to solve it?
&gt; 
&gt; 
&gt; 
&gt; Many thanks,
&gt; 
&gt; peter
&gt; 
&gt; 
&gt;                           
&gt; _________________________________________________________________
&gt; Got more than one Hotmail account? Save time by linking them together
&gt; http://clk.atdmt.com/UKM/go/186394591/direct/01/



</pre>
</div>
</content>
</entry>
<entry>
<title>Question: Write to Solr but not via http, and still store date_format</title>
<author><name>Peter 4U &lt;peter4u@hotmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3cCOL119-W453AFA58DF3B7008BF01C5F8930@phx.gbl%3e"/>
<id>urn:uuid:%3cCOL119-W453AFA58DF3B7008BF01C5F8930@phx-gbl%3e</id>
<updated>2009-12-04T20:09:19Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

Hi Solr team,

 

Has anyone been able to write to Solr, keeping things like 'date_format', but indexing directly,
rather than via http?

 

I've been indexing using Lucene Java, and this works well and is very fast, except that any
data indexed this way doesn't store date_format et al information (date.format resuts always
return 0).

I like indexing directly into Lucene, rather than via http requests, as it is much faster,
particularly at very high input rates.

 

Anyone encountered this and managed to solve it?

 

Many thanks,

peter

 
 		 	   		  
_________________________________________________________________
Got more than one Hotmail account? Save time by linking them together
 http://clk.atdmt.com/UKM/go/186394591/direct/01/

</pre>
</div>
</content>
</entry>
<entry>
<title>RE: Dumping solr requests for indexing</title>
<author><name>Teruhiko Kurosaka &lt;Kuro@basistech.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3cCAE8639A6FAB3D4AA7973D5897B325185F2ACF19CC@MSG-BOX.basistech.net%3e"/>
<id>urn:uuid:%3cCAE8639A6FAB3D4AA7973D5897B325185F2ACF19CC@MSG-BOX-basistech-net%3e</id>
<updated>2009-12-04T20:05:57Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Log only tells high-level descriptions of what were done.
I'd like to capture the exact XML requests with data, so that
I could re-feed it to Solr to reproduce the issue my
customer is encountering.

-kuro  

&gt; -----Original Message-----
&gt; From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
&gt; Sent: Friday, December 04, 2009 11:41 AM
&gt; To: solr-user@lucene.apache.org
&gt; Subject: Re: Dumping solr requests for indexing
&gt; 
&gt; The solr log, as well as the servlet container log should 
&gt; have them all.
&gt; 
&gt; Otis
&gt; --
&gt; Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
&gt; 
&gt; 
&gt; 
&gt; ----- Original Message ----
&gt; &gt; From: Teruhiko Kurosaka &lt;Kuro@basistech.com&gt;
&gt; &gt; To: "solr-user@lucene.apache.org" &lt;solr-user@lucene.apache.org&gt;
&gt; &gt; Sent: Fri, December 4, 2009 2:23:17 PM
&gt; &gt; Subject: Dumping solr requests for indexing
&gt; &gt; 
&gt; &gt; Is there any way to dump all incoming requests to Solr into a file?
&gt; &gt; 
&gt; &gt; My customer is seeing a strange problem of disappearing docs from 
&gt; &gt; index and I'd like to ask them to capture all incoming requests.
&gt; &gt; 
&gt; &gt; Thanks.
&gt; &gt; 
&gt; &gt; -kuro
&gt; 
&gt; 

</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Best way to handle bitfields in solr...</title>
<author><name>Otis Gospodnetic &lt;otis_gospodnetic@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c280680.68292.qm@web50306.mail.re2.yahoo.com%3e"/>
<id>urn:uuid:%3c280680-68292-qm@web50306-mail-re2-yahoo-com%3e</id>
<updated>2009-12-04T20:03:07Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Would http://wiki.apache.org/solr/FunctionQuery#fieldvalue help?

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
&gt; From: William Pierce &lt;evalsinca@hotmail.com&gt;
&gt; To: solr-user@lucene.apache.org
&gt; Sent: Fri, December 4, 2009 2:43:25 PM
&gt; Subject: Best way to handle bitfields in solr...
&gt; 
&gt; Folks:
&gt; 
&gt; In my db I currently have fields that represent bitmasks.   Thus, for example, a 
&gt; value of the mask of 48 might represent an "undergraduate" (value = 16) and 
&gt; "graduate" (value = 32).   Currently,  the corresponding field in solr is a 
&gt; multi-valued string field called "EdLevel" which will have 
&gt; Undergraduate and Graduate  as its two values (for 
&gt; this example).   I do the conversion from the int into the list of values as I 
&gt; do the indexing.
&gt; 
&gt; Ideally, I'd like solr to have bitwise operations so that I could store the int 
&gt; value, and then simply search by using bit operations.  However, given that this 
&gt; is not possible,  and that there have been recent threads speaking to 
&gt; performance issues with multi-valued fields,  is there something better I could 
&gt; do?
&gt; 
&gt; TIA,
&gt; 
&gt; - Bill



</pre>
</div>
</content>
</entry>
<entry>
<title>Best way to handle bitfields in solr...</title>
<author><name>&quot;William Pierce&quot; &lt;evalsinca@hotmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3cSNT102-DS23B2CE19A2A1BBB998A62A7930@phx.gbl%3e"/>
<id>urn:uuid:%3cSNT102-DS23B2CE19A2A1BBB998A62A7930@phx-gbl%3e</id>
<updated>2009-12-04T19:43:25Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Folks:

In my db I currently have fields that represent bitmasks.   Thus, for example, a value of
the mask of 48 might represent an "undergraduate" (value = 16) and "graduate" (value = 32).
  Currently,  the corresponding field in solr is a multi-valued string field called "EdLevel"
which will have &lt;value&gt;Undergraduate&lt;/value&gt; and &lt;value&gt;Graduate&lt;/value&gt;
 as its two values (for this example).   I do the conversion from the int into the list of
values as I do the indexing.

Ideally, I'd like solr to have bitwise operations so that I could store the int value, and
then simply search by using bit operations.  However, given that this is not possible,  and
that there have been recent threads speaking to performance issues with multi-valued fields,
 is there something better I could do?

TIA,

- Bill

</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Dumping solr requests for indexing</title>
<author><name>Otis Gospodnetic &lt;otis_gospodnetic@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c299599.5933.qm@web50308.mail.re2.yahoo.com%3e"/>
<id>urn:uuid:%3c299599-5933-qm@web50308-mail-re2-yahoo-com%3e</id>
<updated>2009-12-04T19:40:30Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
The solr log, as well as the servlet container log should have them all.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
&gt; From: Teruhiko Kurosaka &lt;Kuro@basistech.com&gt;
&gt; To: "solr-user@lucene.apache.org" &lt;solr-user@lucene.apache.org&gt;
&gt; Sent: Fri, December 4, 2009 2:23:17 PM
&gt; Subject: Dumping solr requests for indexing
&gt; 
&gt; Is there any way to dump all incoming requests to Solr
&gt; into a file?
&gt; 
&gt; My customer is seeing a strange problem of disappearing
&gt; docs from index and I'd like to ask them to capture all
&gt; incoming requests.
&gt; 
&gt; Thanks.
&gt; 
&gt; -kuro 



</pre>
</div>
</content>
</entry>
<entry>
<title>Dumping solr requests for indexing</title>
<author><name>Teruhiko Kurosaka &lt;Kuro@basistech.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3cCAE8639A6FAB3D4AA7973D5897B325185F2ACF19BE@MSG-BOX.basistech.net%3e"/>
<id>urn:uuid:%3cCAE8639A6FAB3D4AA7973D5897B325185F2ACF19BE@MSG-BOX-basistech-net%3e</id>
<updated>2009-12-04T19:23:17Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Is there any way to dump all incoming requests to Solr
into a file?

My customer is seeing a strange problem of disappearing
docs from index and I'd like to ask them to capture all
incoming requests.

Thanks.

-kuro 


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: search on tomcat server</title>
<author><name>&quot;William Pierce&quot; &lt;evalsinca@hotmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3cSNT102-DS19E0DE6098DE4A52665957A7930@phx.gbl%3e"/>
<id>urn:uuid:%3cSNT102-DS19E0DE6098DE4A52665957A7930@phx-gbl%3e</id>
<updated>2009-12-04T18:55:35Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Have you gone through the solr tomcat wiki?

http://wiki.apache.org/solr/SolrTomcat

I found this very helpful when I did our solr installation on tomcat.

- Bill

--------------------------------------------------
From: "Jill Han" &lt;jill.han@alverno.edu&gt;
Sent: Friday, December 04, 2009 8:54 AM
To: &lt;solr-user@lucene.apache.org&gt;
Subject: RE: search on tomcat server

&gt; I went through all the links on 
&gt; http://wiki.apache.org/solr/#Search_and_Indexing
&gt; And still have no clue as how to proceed.
&gt; 1. do I have to do some implementation in order to get solr to search doc. 
&gt; on tomcat server?
&gt; 2. if I have files, such as .doc, docx, .pdf, .jsp, .html, etc under 
&gt; window xp, c:/tomcat/webapps/test1, /webapps/test2,
&gt;   What should I do to make solr search those directories
&gt; 3. since I am using tomcat, instead of jetty, is there any demo that shows 
&gt; the solr searching features, and real searching result?
&gt;
&gt; Thanks,
&gt; Jill
&gt;
&gt;
&gt; -----Original Message-----
&gt; From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com]
&gt; Sent: Monday, November 30, 2009 10:40 AM
&gt; To: solr-user@lucene.apache.org
&gt; Subject: Re: search on tomcat server
&gt;
&gt; On Mon, Nov 30, 2009 at 9:55 PM, Jill Han &lt;jill.han@alverno.edu&gt; wrote:
&gt;
&gt;&gt; I got solr running on the tomcat server,
&gt;&gt; http://localhost:8080/solr/admin/
&gt;&gt;
&gt;&gt; After I enter a search word, such as, solr, then hit Search button, it
&gt;&gt; will go to
&gt;&gt;
&gt;&gt; http://localhost:8080/solr/select/?q=solr&amp;version=2.2&amp;start=0&amp;rows=10&amp;in
&gt;&gt; dent=on&lt;http://localhost:8080/solr/select/?q=solr&amp;version=2.2&amp;start=0&amp;rows=10&amp;in%0Adent=on&gt;
&gt;&gt;
&gt;&gt;  and display
&gt;&gt;
&gt;&gt;   &lt;?xml version="1.0" encoding="UTF-8" ?&gt;
&gt;&gt;
&gt;&gt; -
&gt;&gt; &lt;http://localhost:8080/solr/select/?q=solr&amp;version=2.2&amp;start=0&amp;rows=10&amp;i
&gt;&gt; ndent=on&lt;http://localhost:8080/solr/select/?q=solr&amp;version=2.2&amp;start=0&amp;rows=10&amp;i%0Andent=on&gt;&gt;
&gt;&gt;  &lt;&lt;response&gt;
&gt;&gt;
&gt;&gt; -
&gt;&gt; &lt;http://localhost:8080/solr/select/?q=solr&amp;version=2.2&amp;start=0&amp;rows=10&amp;i
&gt;&gt; ndent=on&lt;http://localhost:8080/solr/select/?q=solr&amp;version=2.2&amp;start=0&amp;rows=10&amp;i%0Andent=on&gt;&gt;
&gt;&gt;  &lt;  &lt;lst name="responseHeader"&gt;
&gt;&gt;
&gt;&gt;  &lt;    &lt;int name="status"&gt;0&lt;/int&gt;
&gt;&gt;
&gt;&gt;  &lt;    &lt;int name="QTime"&gt;0&lt;/int&gt;
&gt;&gt;
&gt;&gt; -
&gt;&gt; &lt;http://localhost:8080/solr/select/?q=solr&amp;version=2.2&amp;start=0&amp;rows=10&amp;i
&gt;&gt; ndent=on&lt;http://localhost:8080/solr/select/?q=solr&amp;version=2.2&amp;start=0&amp;rows=10&amp;i%0Andent=on&gt;&gt;
&gt;&gt;  &lt;    &lt;lst name="params"&gt;
&gt;&gt;
&gt;&gt;  &lt;      &lt;str name="rows"&gt;10&lt;/str&gt;
&gt;&gt;
&gt;&gt;  &lt;      &lt;str name="start"&gt;0&lt;/str&gt;
&gt;&gt;
&gt;&gt;  &lt;      &lt;str name="indent"&gt;on&lt;/str&gt;
&gt;&gt;
&gt;&gt;  &lt;      &lt;str name="q"&gt;solr&lt;/str&gt;
&gt;&gt;
&gt;&gt;  &lt;      &lt;str name="version"&gt;2.2&lt;/str&gt;
&gt;&gt;
&gt;&gt;     &lt;/lst&gt;
&gt;&gt;
&gt;&gt;   &lt;/lst&gt;
&gt;&gt;
&gt;&gt;  &lt;  &lt;result name="response" numFound="0" start="0" /&gt;
&gt;&gt;
&gt;&gt;  &lt;/response&gt;
&gt;&gt;
&gt;&gt;  My question is what is the next step to search files on tomcat server?
&gt;&gt;
&gt;&gt;
&gt;&gt;
&gt; Looks like you have not added any documents to Solr. See the "Indexing
&gt; Documents" section at http://wiki.apache.org/solr/#Search_and_Indexing
&gt;
&gt; -- 
&gt; Regards,
&gt; Shalin Shekhar Mangar.
&gt; 


</pre>
</div>
</content>
</entry>
<entry>
<title>how to do auto-suggest case-insensitive match and return original case field values</title>
<author><name>hermida &lt;leandro.hermida@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200912.mbox/%3c26636365.post@talk.nabble.com%3e"/>
<id>urn:uuid:%3c26636365-post@talk-nabble-com%3e</id>
<updated>2009-12-04T18:22:25Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

Hi everyone,

New to forum and to Solr, doing my first major project with it and enjoying
it so far, great software.

In my web application I want to set up auto-suggest as you type
functionality which will search case-insensitively yet return the original
case terms.  It doesn't seem like TermsComponent can do this as it can only
return the lowercase indexed terms your are searching against, not the
original case terms.

There was one post on this forum 
http://old.nabble.com/Auto-suggest..-how-to-do-mixed-case-td24106666.html#a24143981
http://old.nabble.com/Auto-suggest..-how-to-do-mixed-case-td24106666.html#a24143981 
where someone asked the same question, and what someone said is to

There is no way to do this right now using TermsComponent. You can index
lower case terms and store the mixed case terms. Then you can use a prefix
query which will return documents (and hence stored field values).

So this got me started, I set out to use Solr Query instead of
TermsComponent to try to do this.  I did the following as mentioned:

&lt;fieldType name="test" class="solr.TextField" positionIncrementGap="100"&gt;
  &lt;analyzer&gt;
    &lt;tokenizer class="solr.KeywordTokenizerFactory"/&gt;
  &lt;/analyzer&gt;
&lt;/fieldType&gt;

&lt;fieldType name="test_lc" class="solr.TextField" positionIncrementGap="100"&gt;
  &lt;analyzer&gt;
    &lt;tokenizer class="solr.KeywordTokenizerFactory"/&gt;
    &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
  &lt;/analyzer&gt;
&lt;/fieldType&gt;

&lt;field name="test" type="test" indexed="false" stored="true"
multiValued="true" /&gt;
&lt;field name="test_lc" type="test_lc" indexed="true"  stored="false"
multiValued="true" /&gt;

And used copyField to populate the test_lc field:

&lt;copyField source="test" dest="test_lc"/&gt;

This is the easy part (the forum user didn't explain the hard part!) It is
very hard to get the same information that TermsComponent returns using the
regular Solr Query functionality!  For example:

http://localhost:8983/solr/terms?terms.fl=test_lc&amp;terms.prefix=a&amp;terms.sort=count&amp;terms.limit=5&amp;omitHeader=true

&lt;int name="a-kinase anchor protein 13"&gt;15&lt;/int&gt;
&lt;int name="accn5"&gt;6&lt;/int&gt;
&lt;int name="actin-binding"&gt;3&lt;/int&gt;
&lt;int name="activator"&gt;1&lt;/int&gt;
&lt;int name="agie-bp1"&gt;1&lt;/int&gt;

which provides useful sorting by and returning of term frequency counts in
your index.  How does one get this same information with regular Solr Query? 
I set up the following prefix query, searching by the indexed lowercased
field and returning the other:

http://localhost:8983/solr/select?fl=test&amp;q=test_lc%3Aa*&amp;sort=score+desc&amp;rows=5&amp;omitHeader=true

&lt;doc&gt;
  &lt;arr name="test"&gt;
    &lt;str&gt;3D-structure&lt;/str&gt;
    &lt;str&gt;acetylation&lt;/str&gt;
    &lt;str&gt;alternative promoter usage&lt;/str&gt;
    &lt;str&gt;HLC-7&lt;/str&gt;
  &lt;/arr&gt;
&lt;/doc&gt;
&lt;doc&gt;
  &lt;arr name="test"&gt;
    &lt;str&gt;alternative splicing&lt;/str&gt;
    &lt;str&gt;complete proteome&lt;/str&gt;
    &lt;str&gt;DNA-binding&lt;/str&gt;
    &lt;str&gt;RACK1&lt;/str&gt;
  &lt;/arr&gt;
&lt;/doc&gt;
&lt;doc&gt;
  &lt;arr name="test"&gt;
    &lt;str&gt;acetylation&lt;/str&gt;
    &lt;str&gt;AIG21&lt;/str&gt;
    &lt;str&gt;WD repeat&lt;/str&gt;
    &lt;str&gt;GNB2L1&lt;/str&gt;
  &lt;/arr&gt;
&lt;/doc&gt;
&lt;doc&gt;
&lt;/arr&gt;
  &lt;arr name="test"&gt;
    &lt;str&gt;3D-structure&lt;/str&gt;
    &lt;str&gt;apoptosis&lt;/str&gt;
    &lt;str&gt;cathepsin G-like 1&lt;/str&gt;
    &lt;str&gt;ATSGL1&lt;/str&gt;
    &lt;str&gt;CTLA-1&lt;/str&gt;
  &lt;/arr&gt;
&lt;/doc&gt;
&lt;doc&gt;
  &lt;arr name="test"&gt;
    &lt;str&gt;autoantigen Ge-1&lt;/str&gt;
    &lt;str&gt;autoantigen RCD-8&lt;/str&gt;
    &lt;str&gt;HERV-H LTR-associating protein 3&lt;/str&gt;
    &lt;str&gt;HHLA3&lt;/str&gt;
  &lt;/arr&gt;
&lt;/doc&gt;

I can see how to process this in my front-end app to extract the original
terms starting with the prefix letter(s) used in the query, but there are
still some major problems when compared to TermsComponent:

- How do I make sure my auto-suggest list is at least a certain number of
terms long?  Using rows of course doesn't work like terms.limit, because
between returned docs there can be the same term and these will get
collapsed.
- How do I get term frequency counts like TermsComponent does?  I looked at
faceting but I don't understand how to get the TermsComponent behavior using
it.

Sorry for the long message, just wanted to fully explain, thanks for any
help!

leandro

-- 
View this message in context: http://old.nabble.com/how-to-do-auto-suggest-case-insensitive-match-and-return-original-case-field-values-tp26636365p26636365.html
Sent from the Solr - User mailing list archive at Nabble.com.



</pre>
</div>
</content>
</entry>
</feed>
