lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Hastings <hastings.recurs...@gmail.com>
Subject Re: Section symbol, ignore in some queries but not others?
Date Wed, 25 Jul 2018 19:31:53 GMT
Ah, so I could index the text including the § character as an alpha, use no
qs value when trying to ignore it, and for users add i a qs value assuming
I use edismax, whic I currently am.

Tested this method and it works as expected.  Thanks, saved me a lot of
time!
-David

On Wed, Jul 25, 2018 at 3:15 PM, Alexandre Rafalovitch <arafalov@gmail.com>
wrote:

> If you copyField and don't store the copy, then it is only the indexed
> (term) representation for the copy that is much smaller. Just a
> thought.
>
> The other thing is that you seem to be saying that you want to do a
> match phrase but with a token gap, right? Like an eDisMax slop?
> http://lucene.apache.org/solr/guide/7_4/the-extended-dismax-
> query-parser.html
>
> Regards,
>    Alex.
>
> On 25 July 2018 at 14:47, David Hastings <hastings.recursive@gmail.com>
> wrote:
> > Hey all.  have a situation that seems pretty rough.  currently in our
> data
> > we have a lot of sentences like this:
> >
> > elements comprise the "stuff" of the tax. 3 Reg. § 1.901-2(a)(2). 4 Only
> > non-Saudis are subject to the
> > <https://heinonline.org/HOL/SearchVolumeSOLR?input=(((%
> 223%20Regulation%201%22%20OR%20%223%20Regulation%201%22%
> 20OR%20%223%20Reg.%201%22)%20AND%20NOT%20id:hein.
> journals/rcatorbg3.14))&div=13&handle=hein.journals/
> taxlr53&collection=journals>
> > By default the word delimiter is treating all punctuation as a space.  So
> > when you search for:
> > 3 Reg. 1, your results can include  3 Reg. § 1.901
> >
> > I Have experimented with the WDF and added § => ALPHA and this works, and
> > treats the character as a letter.  however during some queries, I still
> > need searches such as
> >
> > Servitudes 2.10
> >
> > to return results with:
> >
> >
> > Servitudes § 2.10
> >
> >
> > I at the moment can not conceive of a way to to this aside from two
> > separate text fields, and effectively doubling the size of my index.
> > which currently sits at 300 gb optimized, and 500gb if left to its
> > own.
> >
> >
> > Thanks for any help or suggestions
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message