lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr regex query help
Date Sat, 24 Jan 2015 05:57:27 GMT
Right. As I mentioned on the original JIRA, the regex match is happening on
_terms_.
You are conflating the original input (the entire field) with the
individual terms that the
regex is applied to.

I suggest that you look at the admin/analysis page. There you'll see the
terms that are
indexed and you'll see that the regex simply cannot work since it assumes
that the
regex is applied to the entire input rather than the results of the
analysis chain.

I further suggest that you explore tokenization and how
individual terms are searched. The admin/analysis page is invaluable in this
endeavor.

The root cause of your confusion is that, given you're using
ClassicTokenizer,
you have a bunch of individual terms that are being searched, _not_ the
whole
input. So the regex is bound to fail since you're thinking in terms of the
entire
input rather than the result of your analysis chain, i.e. tokenization +
filters
as defined in schema.xml.

FWIW,
Erick

On Fri, Jan 23, 2015 at 8:58 PM, Arumugam, Suresh <Suresh.Arumugam@emc.com>
wrote:

> Hi All,
>
>
>
> We have indexed the documents to Solr & not able to query using the Regex.
>
>
>
> Our data looks like as below in a Text Field, which is indexed using the
> ClassicTokenizer.
>
>
>
> *                1b ::PIPE:: 04/14/2014 ::PIPE:: 01:32:48 ::PIPE:: BMC
> Power/Reset action  ::PIPE:: Delayed shutdown timer disabled ::PIPE::
> Asserted*
>
>
>
>                 We tried lookup this string with the Regex.
>
> *PIPE*[0-9]{2}\/[0-9}{2}\/[0-9]{4}*Delayed shutdown*Asserted*
>
>
>
>                 Since the analyzer tokenized the data, the regex match is
> happening on the terms & it’s not working as we expect.
>
>
>
> Can you please help us in finding an equivalent way to query this in Solr
> ?
>
>
>
> The following are the details about our environment.
>
>
>
> 1.       Solr 4.10.3 as well as Solr 4.8
>
> 2.       JDK 1.7_51
>
> 3.       SolrConfig.xml & Schema.xml attached.
>
>
>
> The regex query as below is working
>
> msg:/[0-9]{2}/
>
>
>
> But when we want to match more than one terms the regex doesn't seems to
> be working.
>
> Please help us in resolving this issue.
>
>
>
> Thanks in advance.
>
>
>
> Regards,
>
> Suresh.A
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message