lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hugh Cayless <philomou...@gmail.com>
Subject Re: regex constructs allowed in queries
Date Thu, 29 Aug 2013 17:56:43 GMT
The context for this is that I'm migrating an application from Solr 3.5 to 4.4. We had regex
search working (in kind of a hacky way), but since 4.x has regex search support built in,
I'm trying to switch to that. Some things work the way I'd expect, some clearly don't. So
my question was, in the first instance "Is there full regex support?" Clearly, there's supposed
to be, so something is wrong, or I don't know the right escape syntax.

I think our definitions of "good documentation" differ :-). 

Nevertheless,

On Aug 29, 2013, at 13:05 , Chris Hostetter <hossman_lucene@fucit.org> wrote:

> It's impossible for us to guess what sort of problem you might be having, 
> since you haven't shown us...
> 
> * any actual examples of requests you are trying to send

Using the admin webapp query form, the following search works:

1)
untokenized_ia:/.*νιν.{1,20}ασκλα.*/

this corresponds to:

http://localhost:8083/solr/pn-search/select?q=untokenized_ia%3A%2F.*%CE%BD%CE%B9%CE%BD.%7B1%2C20%7D%CE%B1%CF%83%CE%BA%CE%BB%CE%B1.*%2F&wt=xml&indent=true

2) works as well
untokenized_ia:/.*νιν.{1,20} ασκλα.*/  (with a plain space before ασκλα)

http://localhost:8083/solr/pn-search/select?q=untokenized_ia%3A%2F.*%CE%BD%CE%B9%CE%BD.%7B1%2C20%7D+%CE%B1%CF%83%CE%BA%CE%BB%CE%B1.*%2F&wt=xml&indent=true

3) returns no results
untokenized_ia:/.*νιν.{1,20}\bασκλα.*/ (with a word boundary before ασκλα)

http://localhost:8083/solr/pn-search/select?q=untokenized_ia%3A%2F.*%CE%BD%CE%B9%CE%BD.%7B1%2C20%7D%5Cb%CE%B1%CF%83%CE%BA%CE%BB%CE%B1.*%2F&wt=xml&indent=true

untokenized_ia:/.*νιν.{1,20}\\bασκλα.*/ also fails. As does untokenized_ia:/.*νιν.{1,20}\sασκλα.*/

> * any logs showing how those requests are recieved by solr

Access log: 

1) (works)

0:0:0:0:0:0:0:1 - - [29/Aug/2013:13:25:14 -0400] "GET /solr/pn-search/select?q=untokenized_ia%3A%2F.*%CE%BD%CE%B9%CE%BD.%7B1%2C20%7D%CE%B1%CF%83%CE%BA%CE%BB%CE%B1.*%2F&wt=xml&indent=true&_=1377797113194
HTTP/1.1" 200 7899

2) (works)

0:0:0:0:0:0:0:1 - - [29/Aug/2013:13:27:12 -0400] "GET /solr/pn-search/select?q=untokenized_ia%3A%2F.*%CE%BD%CE%B9%CE%BD.%7B1%2C20%7D+%CE%B1%CF%83%CE%BA%CE%BB%CE%B1.*%2F&wt=xml&indent=true&_=1377797225482
HTTP/1.1" 200 7900

3) (fails to find anything)

0:0:0:0:0:0:0:1 - - [29/Aug/2013:13:28:13 -0400] "GET /solr/pn-search/select?q=untokenized_ia%3A%2F.*%CE%BD%CE%B9%CE%BD.%7B1%2C20%7D%5Cb%CE%B1%CF%83%CE%BA%CE%BB%CE%B1.*%2F&wt=xml&indent=true&_=1377797291413
HTTP/1.1" 200 431

unescaped:

0:0:0:0:0:0:0:1 - - [29/Aug/2013:13:28:13 -0400] "GET /solr/pn-search/select?q=untokenized_ia:/.*νιν.{1,20}\bασκλα.*/&wt=xml&indent=true&_=1377797291413
HTTP/1.1" 200 431

> * any debugQuery=true response showing how those queries got parsed

Gist for #2: https://gist.github.com/hcayless/6381100

Gist for #3: https://gist.github.com/hcayless/6381031

> * any example docs in your index.

Here's one that should be matched: https://gist.github.com/hcayless/6381169

> 
> https://wiki.apache.org/solr/UsingMailingLists#Information_useful_for_searching_problems
> 
> -Hoss


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message