lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: RegexReplaceProcessorFactory pattern to detect multiple \n
Date Tue, 05 Mar 2019 16:29:16 GMT
Hi Paul,

Thank you for the reply.

I have tried to add the following configuration according to your
suggestion:

<processor class="solr.RegexReplaceProcessorFactory">
   <str name="fieldName">content</str>
   <str name="pattern">[ \t]*\r?\n}</str>
   <str name="replacement">&lt;br&gt;</str>
   <bool name="literalReplacement">true</bool>
</processor>

<processor class="solr.RegexReplaceProcessorFactory">
   <str name="fieldName">content</str>
   <str name="pattern">(&lt;br&gt;&lt;br&gt;){3,}</str>
   <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
   <bool name="literalReplacement">true</bool>
</processor>

However, none of the \n is being removed this time round.
Is the order and/or the pattern correct?

Regards,
Edwin

On Tue, 5 Mar 2019 at 19:54, <paul.dodd@ub.unibe.ch> wrote:

> Hi Edwin
>
>
>
> Try for the first pattern/replacement
>
>
>
> <str name="pattern">[ \t]*\r?\n</str>
>
> <str name="replacement">&lt;br&gt;</str>
>
>
>
> Now all line endings and preceding whitespace characters should be changed
> to ‘<br>’.
>
>
>
> The second pattern replacement should replace 3 or more ‘<br>’ sequences
> to 2 ‘<br>’ sequences:
>
>
>
> <str name="pattern">(&lt;br&gt;&lt;br&gt;){3,}</str>
>
> <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
>
>
>
> Hope this approach works. Sorry for not replying earlier and best regards,
>
> Paul
>
>
>
>
>
> Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> für
> Windows 10
>
>
>
> Von: Zheng Lin Edwin Yeo<mailto:edwinyeozl@gmail.com>
> Gesendet: Dienstag, 5. März 2019 03:35
> An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n
>
>
>
> Hi,
>
> For your info, this issue is occurring in the new Solr 7.7.1 as well.
>
> Regards,
> Edwin
>
> On Mon, 25 Feb 2019 at 10:28, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> wrote:
>
> > Hi,
> >
> > Anyone else has other suggestions or have faced the same problem?
> >
> > Regards,
> > Edwin
> >
> > On Wed, 20 Feb 2019 at 16:58, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> > wrote:
> >
> >> Hi Paul,
> >>
> >> If I tried to execute the second step first, then I will only get a
> >> single <br> for those with 2 <br>.
> >> For those that we originally get 4 <br>, there will be 2 <br> with
a
> >> space in between.
> >>
> >> This is just changing the 2 <br> to be a single <br>, since the
second
> >> step is to replace with a single <br>.
> >> But it has not solved the underlying problem yet.
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >> On Wed, 20 Feb 2019 at 16:41, <paul.dodd@ub.unibe.ch> wrote:
> >>
> >>> If the second step is executed first, then you will get the unwanted 4
> >>> <br>
> >>>
> >>>
> >>>
> >>> Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986>
für
> >>> Windows 10
> >>>
> >>>
> >>>
> >>> Von: Zheng Lin Edwin Yeo<mailto:edwinyeozl@gmail.com>
> >>> Gesendet: Mittwoch, 20. Februar 2019 09:29
> >>> An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> >>> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n
> >>>
> >>>
> >>>
> >>> Hi Jörn ,
> >>>
> >>> Do you mean the regex is not correct?
> >>>
> >>> We are already using two RegexReplaceProcessorFactory steps, like the
> one
> >>> shown below. The output that we get is still the same.
> >>>
> >>> <processor class="solr.RegexReplaceProcessorFactory">
> >>>      <str name="fieldName">content</str>
> >>>      <str name="pattern">([ \t]*\r?\n){2,}</str>
> >>>      <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
> >>>      <bool name="literalReplacement">true</bool>
> >>> <processor>
> >>>
> >>> <processor class="solr.RegexReplaceProcessorFactory">
> >>>      <str name="fieldName">content</str>
> >>>      <str name="pattern">([ \t]*\r?\n){1,}</str>
> >>>      <str name="replacement">&lt;br&gt;</str>
> >>>      <bool name="literalReplacement">true</bool>
> >>> <processor>
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>> On Wed, 20 Feb 2019 at 16:03, Jörn Franke <jornfranke@gmail.com>
> wrote:
> >>>
> >>> > Then you need two regexprocessfactory steps
> >>> >
> >>> > > Am 20.02.2019 um 08:12 schrieb Zheng Lin Edwin Yeo <
> >>> edwinyeozl@gmail.com
> >>> > >:
> >>> > >
> >>> > > Hi,
> >>> > >
> >>> > > Thanks for the reply.
> >>> > >
> >>> > > Do you know of any regex online tool that works correctly for
Java
> >>> regex?
> >>> > > I tried to find some, but they are not working properly.
> >>> > >
> >>> > > Yes, our plan is to replace more than one \n with <br><br>,
and
> >>> single \n
> >>> > > with single <br>.
> >>> > >
> >>> > > Regards,
> >>> > > Edwin
> >>> > >
> >>> > >> On Wed, 20 Feb 2019 at 14:59, Jörn Franke <jornfranke@gmail.com>
> >>> wrote:
> >>> > >>
> >>> > >> Solr uses Java regex matching, so i doubt there is a bug -
it
> would
> >>> then
> >>> > >> be in the JDK. Try out in a regex online Tool that supports
Java
> >>> regex
> >>> > for
> >>> > >> your solution.
> >>> > >>
> >>> > >> I believe you want to have 2 regex process factories:
> >>> > >> One that deals with single \n and one that deals with more
than
> one
> >>> \n
> >>> > >>
> >>> > >>> Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo <
> >>> > edwinyeozl@gmail.com
> >>> > >>> :
> >>> > >>>
> >>> > >>> Hi,
> >>> > >>>
> >>> > >>> We have tried with the following pattern ([ \t]*\r?\n){2,}
and
> >>> > >>> configuration:
> >>> > >>>
> >>> > >>> <processor class="solr.RegexReplaceProcessorFactory">
> >>> > >>>  <str name="fieldName">content</str>
> >>> > >>>  <str name="pattern">([ \t]*\r?\n){2,}</str>
> >>> > >>>  <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
> >>> > >>>  <bool name="literalReplacement">true</bool>
> >>> > >>> </processor>
> >>> > >>>
> >>> > >>> However, the issue is still occurring.
> >>> > >>>
> >>> > >>> Anyone else is able to help?
> >>> > >>>
> >>> > >>> Regards,
> >>> > >>> Edwin
> >>> > >>>
> >>> > >>> On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo <
> >>> > edwinyeozl@gmail.com>
> >>> > >>> wrote:
> >>> > >>>
> >>> > >>>> Hi,
> >>> > >>>>
> >>> > >>>> For your info, this issue is occurring in Solr 7.7.0
as well.
> >>> > >>>>
> >>> > >>>> Regards,
> >>> > >>>> Edwin
> >>> > >>>>
> >>> > >>>> On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo
<
> >>> > edwinyeozl@gmail.com
> >>> > >>>
> >>> > >>>> wrote:
> >>> > >>>>
> >>> > >>>>> Hi,
> >>> > >>>>>
> >>> > >>>>> Should we report this as a bug in Solr?
> >>> > >>>>>
> >>> > >>>>> Regards,
> >>> > >>>>> Edwin
> >>> > >>>>>
> >>> > >>>>> On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo
<
> >>> > edwinyeozl@gmail.com
> >>> > >>>
> >>> > >>>>> wrote:
> >>> > >>>>>
> >>> > >>>>>> Hi Paul,
> >>> > >>>>>>
> >>> > >>>>>> Regarding the regex (\n\s*){2,} that we are
using, when we try
> >>> in on
> >>> > >>>>>> https://regex101.com/, it is able to give
us the correct
> >>> result for
> >>> > >> all
> >>> > >>>>>> the examples (ie: All of them will only have
<br><br>, and not
> >>> more
> >>> > >> than
> >>> > >>>>>> that like what we are getting in Solr in our
earlier
> examples).
> >>> > >>>>>>
> >>> > >>>>>> Could there be a possibility of a bug in Solr?
> >>> > >>>>>>
> >>> > >>>>>> Regards,
> >>> > >>>>>> Edwin
> >>> > >>>>>>
> >>> > >>>>>> On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin
Yeo <
> >>> > >> edwinyeozl@gmail.com>
> >>> > >>>>>> wrote:
> >>> > >>>>>>
> >>> > >>>>>>> Hi Paul,
> >>> > >>>>>>>
> >>> > >>>>>>> We have tried it with the space preceeding
the \n i.e. <str
> >>> > >>>>>>> name="pattern">(\s*\n){2,}</str>,
with the following regex
> >>> pattern:
> >>> > >>>>>>>
> >>> > >>>>>>> <processor class="solr.RegexReplaceProcessorFactory">
> >>> > >>>>>>>  <str name="fieldName">content</str>
> >>> > >>>>>>>  <str name="pattern">(\s*\n){2,}</str>
> >>> > >>>>>>>  <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
> >>> > >>>>>>> </processor>
> >>> > >>>>>>>
> >>> > >>>>>>> However, we are also getting the exact
same results as the
> >>> earlier
> >>> > >>>>>>> Example 1, 2 and 3.
> >>> > >>>>>>>
> >>> > >>>>>>> As for your point 2 on perhaps in the
data you have other
> (non
> >>> > >>>>>>> printing) characters than \n, we have
find that there are no
> >>> non
> >>> > >> printing
> >>> > >>>>>>> characters. It is just next line with
a space. You can refer
> >>> to the
> >>> > >>>>>>> original content in the same examples
below.
> >>> > >>>>>>>
> >>> > >>>>>>>
> >>> > >>>>>>> Example 1: The sentence that the above
regex pattern is
> working
> >>> > >>>>>>> correctly
> >>> > >>>>>>> *Original content in EML file:*
> >>> > >>>>>>> Dear Sir,
> >>> > >>>>>>>
> >>> > >>>>>>>
> >>> > >>>>>>> I am terminating
> >>> > >>>>>>> *Original content:*    Dear Sir,  \n\n
\n \n\n I am
> terminating
> >>> > >>>>>>> *Index content: *    Dear Sir,  <br><br>I
am terminating
> >>> > >>>>>>>
> >>> > >>>>>>> Example 2: The sentence that the above
regex pattern is
> >>> partially
> >>> > >>>>>>> working (as you can see, instead of 2
<br>, there are 4 <br>)
> >>> > >>>>>>> *Original content in EML file:*
> >>> > >>>>>>>
> >>> > >>>>>>> *exalted*
> >>> > >>>>>>>
> >>> > >>>>>>> *Psalm 89:17*
> >>> > >>>>>>>
> >>> > >>>>>>>
> >>> > >>>>>>> 3 Choa Chu Kang Avenue 4
> >>> > >>>>>>> *Original content:* exalted  \n \n\n 
 Psalm 89:17   \n\n
> >>>  \n\n  3
> >>> > >>>>>>> Choa Chu Kang Avenue 4, Singapore
> >>> > >>>>>>> *Index content: *exalted  <br><br>Psalm
89:17   <br><br>
> >>> <br><br>3
> >>> > >>>>>>> Choa Chu Kang Avenue 4, Singapore
> >>> > >>>>>>>
> >>> > >>>>>>> Example 3: The sentence that the above
regex pattern is
> >>> partially
> >>> > >>>>>>> working (as you can see, instead of 2
<br>, there are 4 <br>)
> >>> > >>>>>>> *Original content in EML file:*
> >>> > >>>>>>>
> >>> > >>>>>>> http://www.concordpri.moe.edu.sg/
> >>> > >>>>>>>
> >>> > >>>>>>>
> >>> > >>>>>>>
> >>> > >>>>>>>
> >>> > >>>>>>>
> >>> > >>>>>>>
> >>> > >>>>>>>
> >>> > >>>>>>>
> >>> > >>>>>>> On Tue, Dec 18, 2018 at 10:07 AM
> >>> > >>>>>>> *Original content:* http://www.concordpri.moe.edu.sg/
  \n\n
> >>> >  \n\n
> >>> > >> \n
> >>> > >>>>>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n
\n\n \n\n\n \n\n\n
> On
> >>> Tue,
> >>> > >> Dec 18,
> >>> > >>>>>>> 2018 at 10:07 AM
> >>> > >>>>>>> *Index content: *http://www.concordpri.moe.edu.sg/
>  <br><br>
> >>> > >>>>>>> <br><br>On Tue, Dec 18, 2018
at 10:07 AM
> >>> > >>>>>>>
> >>> > >>>>>>>
> >>> > >>>>>>> Appreciate any other ideas or suggestions
that you may have.
> >>> > >>>>>>>
> >>> > >>>>>>> Thank you.
> >>> > >>>>>>>
> >>> > >>>>>>> Regards,
> >>> > >>>>>>> Edwin
> >>> > >>>>>>>
> >>> > >>>>>>>> On Thu, 7 Feb 2019 at 22:49, <paul.dodd@ub.unibe.ch>
wrote:
> >>> > >>>>>>>>
> >>> > >>>>>>>> Hi Edwin
> >>> > >>>>>>>>
> >>> > >>>>>>>>
> >>> > >>>>>>>>
> >>> > >>>>>>>> 1.  Sorry, the pattern was wrong,
the space should preceed
> >>> the \n
> >>> > >>>>>>>> i.e. <str name="pattern">(\s*\n){2,}</str>
> >>> > >>>>>>>> 2.  Perhaps in the data you have other
(non printing)
> >>> characters
> >>> > >>>>>>>> than \n?
> >>> > >>>>>>>>
> >>> > >>>>>>>>
> >>> > >>>>>>>>
> >>> > >>>>>>>> Gesendet von Mail<
> >>> https://go.microsoft.com/fwlink/?LinkId=550986>
> >>> > >> für
> >>> > >>>>>>>> Windows 10
> >>> > >>>>>>>>
> >>> > >>>>>>>>
> >>> > >>>>>>>>
> >>> > >>>>>>>> Von: Zheng Lin Edwin Yeo<mailto:edwinyeozl@gmail.com>
> >>> > >>>>>>>> Gesendet: Donnerstag, 7. Februar 2019
15:23
> >>> > >>>>>>>> An: solr-user@lucene.apache.org<mailto:
> >>> > solr-user@lucene.apache.org>
> >>> > >>>>>>>> Betreff: Re: RegexReplaceProcessorFactory
pattern to detect
> >>> > >> multiple \n
> >>> > >>>>>>>>
> >>> > >>>>>>>>
> >>> > >>>>>>>>
> >>> > >>>>>>>> Hi Paul,
> >>> > >>>>>>>>
> >>> > >>>>>>>> We have tried this suggested regex
pattern as follow:
> >>> > >>>>>>>> <processor class="solr.RegexReplaceProcessorFactory">
> >>> > >>>>>>>>  <str name="fieldName">content</str>
> >>> > >>>>>>>>  <str name="pattern">(\n\s*){2,}</str>
> >>> > >>>>>>>>  <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
> >>> > >>>>>>>> </processor>
> >>> > >>>>>>>>
> >>> > >>>>>>>> But we still have exactly the same
problem of Example 1,2
> and
> >>> 3
> >>> > >> below.
> >>> > >>>>>>>>
> >>> > >>>>>>>> Example 1: The sentence that the above
regex pattern is
> >>> working
> >>> > >>>>>>>> correctly
> >>> > >>>>>>>> *Original content:*    Dear Sir, 
\n\n \n \n\n I am
> >>> terminating
> >>> > >>>>>>>> *Index content: *    Dear Sir,  <br><br>I
am terminating
> >>> > >>>>>>>>
> >>> > >>>>>>>> Example 2: The sentence that the above
regex pattern is
> >>> partially
> >>> > >>>>>>>> working
> >>> > >>>>>>>> (as you can see, instead of 2 <br>,
there are 4 <br>)
> >>> > >>>>>>>> *Original content:* exalted  \n \n\n
  Psalm 89:17   \n\n
> >>>  \n\n
> >>> > 3
> >>> > >>>>>>>> Choa
> >>> > >>>>>>>> Chu Kang Avenue 4, Singapore
> >>> > >>>>>>>> *Index content: *exalted  <br><br>Psalm
89:17   <br><br>
> >>> > <br><br>3
> >>> > >>>>>>>> Choa
> >>> > >>>>>>>> Chu Kang Avenue 4, Singapore
> >>> > >>>>>>>>
> >>> > >>>>>>>> Example 3: The sentence that the above
regex pattern is
> >>> partially
> >>> > >>>>>>>> working
> >>> > >>>>>>>> (as you can see, instead of 2 <br>,
there are 4 <br>)
> >>> > >>>>>>>> *Original content:* http://www.concordpri.moe.edu.sg/
>  \n\n
> >>> >  \n\n
> >>> > >>>>>>>> \n \n\n
> >>> > >>>>>>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n
\n\n \n\n\n \n\n\n  On
> >>> Tue, Dec
> >>> > >> 18,
> >>> > >>>>>>>> 2018
> >>> > >>>>>>>> at 10:07 AM
> >>> > >>>>>>>> *Index content: *http://www.concordpri.moe.edu.sg/
>  <br><br>
> >>> > >>>>>>>> <br><br>On
> >>> > >>>>>>>> Tue, Dec 18, 2018 at 10:07 AM
> >>> > >>>>>>>>
> >>> > >>>>>>>> Any further suggestion?
> >>> > >>>>>>>>
> >>> > >>>>>>>> Thank you.
> >>> > >>>>>>>>
> >>> > >>>>>>>> Regards,
> >>> > >>>>>>>> Edwin
> >>> > >>>>>>>>
> >>> > >>>>>>>>> On Thu, 7 Feb 2019 at 22:20, <paul.dodd@ub.unibe.ch>
> wrote:
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> To avoid the «\n+\s*» matching
too many \n and then failing
> >>> on
> >>> > the
> >>> > >>>>>>>> {2,}
> >>> > >>>>>>>>> part you could try
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> <str name="pattern">(\n\s*){2,}</str>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> If you also want to match CRLF
then
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> <str name="pattern">(\r?\n\s*){2,}</str>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> Gesendet von Mail<
> >>> https://go.microsoft.com/fwlink/?LinkId=550986
> >>> > >
> >>> > >>>>>>>> für
> >>> > >>>>>>>>> Windows 10
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> Von: Zheng Lin Edwin Yeo<mailto:edwinyeozl@gmail.com>
> >>> > >>>>>>>>> Gesendet: Donnerstag, 7. Februar
2019 15:10
> >>> > >>>>>>>>> An: solr-user@lucene.apache.org<mailto:
> >>> > solr-user@lucene.apache.org
> >>> > >>>
> >>> > >>>>>>>>> Betreff: Re: RegexReplaceProcessorFactory
pattern to detect
> >>> > >> multiple
> >>> > >>>>>>>> \n
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> Hi Paul,
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> Thanks for your reply.
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> When I use this pattern:
> >>> > >>>>>>>>> <processor class="solr.RegexReplaceProcessorFactory">
> >>> > >>>>>>>>>  <str name="fieldName">content</str>
> >>> > >>>>>>>>>  <str name="pattern">(\n+\s*){2,}</str>
> >>> > >>>>>>>>>  <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
> >>> > >>>>>>>>> </processor>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> It is working for some sentence
within the same content and
> >>> not
> >>> > >>>>>>>> working for
> >>> > >>>>>>>>> some sentences. Please see below
for the one that is
> working
> >>> and
> >>> > >>>>>>>> another
> >>> > >>>>>>>>> that is not working (partially
working):
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> Example 1: The sentence that the
above regex pattern is
> >>> working
> >>> > >>>>>>>> correctly
> >>> > >>>>>>>>> *Original content:*    Dear Sir,
 \n\n \n \n\n I am
> >>> terminating
> >>> > >>>>>>>>> *Index content: *    Dear Sir,
 <br><br>I am terminating
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> Example 2: The sentence that the
above regex pattern is
> >>> partially
> >>> > >>>>>>>> working
> >>> > >>>>>>>>> (as you can see, instead of 2
<br>, there are 4 <br>)
> >>> > >>>>>>>>> *Original content:* exalted  \n
\n\n   Psalm 89:17   \n\n
> >>> >  \n\n  3
> >>> > >>>>>>>> Choa
> >>> > >>>>>>>>> Chu Kang Avenue 4, Singapore
> >>> > >>>>>>>>> *Index content: *exalted  <br><br>Psalm
89:17   <br><br>
> >>> > <br><br>3
> >>> > >>>>>>>> Choa
> >>> > >>>>>>>>> Chu Kang Avenue 4, Singapore
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> Example 3: The sentence that the
above regex pattern is
> >>> partially
> >>> > >>>>>>>> working
> >>> > >>>>>>>>> (as you can see, instead of 2
<br>, there are 4 <br>)
> >>> > >>>>>>>>> *Original content:* http://www.concordpri.moe.edu.sg/
>  \n\n
> >>> > >> \n\n
> >>> > >>>>>>>> \n
> >>> > >>>>>>>>> \n\n
> >>> > >>>>>>>>> \n\n \n\n \n\n \n\n \n\n \n\n
\n\n \n\n \n\n\n \n\n\n  On
> >>> Tue,
> >>> > Dec
> >>> > >>>>>>>> 18, 2018
> >>> > >>>>>>>>> at 10:07 AM
> >>> > >>>>>>>>> *Index content: *http://www.concordpri.moe.edu.sg/
> >>>  <br><br>
> >>> > >>>>>>>> <br><br>On
> >>> > >>>>>>>>> Tue, Dec 18, 2018 at 10:07 AM
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> We would appreciate your help
to see what is wrong?
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> Thank you.
> >>> > >>>>>>>>>
> >>> > >>>>>>>>> Regards,
> >>> > >>>>>>>>> Edwin
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>> On Thu, 7 Feb 2019 at 21:24,
<paul.dodd@ub.unibe.ch>
> wrote:
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>> You don’t say what happens,
just that it is not working. I
> >>> > assume
> >>> > >>>>>>>> nothing
> >>> > >>>>>>>>>> is replaced? Perhaps the pattern
should be
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>>  <str name="pattern">"(\n\s*){2,}"</str>
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>> ??
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>> Gesendet von Mail<
> >>> > https://go.microsoft.com/fwlink/?LinkId=550986>
> >>> > >>>>>>>> für
> >>> > >>>>>>>>>> Windows 10
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>> Von: Zheng Lin Edwin Yeo<mailto:edwinyeozl@gmail.com>
> >>> > >>>>>>>>>> Gesendet: Donnerstag, 7. Februar
2019 14:08
> >>> > >>>>>>>>>> An: solr-user@lucene.apache.org<mailto:
> >>> > >> solr-user@lucene.apache.org
> >>> > >>>>>>>>>
> >>> > >>>>>>>>>> Betreff: RegexReplaceProcessorFactory
pattern to detect
> >>> multiple
> >>> > >> \n
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>> Hi,
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>> I am trying to use the RegexReplaceProcessorFactory
to
> >>> remove
> >>> > more
> >>> > >>>>>>>> than
> >>> > >>>>>>>>> two
> >>> > >>>>>>>>>> \n with any number of spaces
between them (Eg: \n\n, \n
> \n,
> >>> \n
> >>> > \n
> >>> > >>>>>>>> \n
> >>> > >>>>>>>>> \n),
> >>> > >>>>>>>>>> and replace it with two <br>.
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>> I use the following regex
pattern and it is working when I
> >>> test
> >>> > it
> >>> > >>>>>>>> in
> >>> > >>>>>>>>>> regex101.com. But it is not
working when I put it inside
> >>> the
> >>> > >>>>>>>>>> RegexReplaceProcessorFactory
as below:
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>> <updateRequestProcessorChain
name="removeCode">
> >>> > >>>>>>>>>> <processor class="solr.RegexReplaceProcessorFactory">
> >>> > >>>>>>>>>>  <str name="fieldName">content</str>
> >>> > >>>>>>>>>>  <str name="pattern">"(\\n\s*){2,}"</str>
> >>> > >>>>>>>>>>  <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
> >>> > >>>>>>>>>> </processor>
> >>> > >>>>>>>>>>         </updateRequestProcessorChain>
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>> To explain further about my
regex pattern, \s* is
> >>> instructing
> >>> > the
> >>> > >>>>>>>> regex
> >>> > >>>>>>>>> to
> >>> > >>>>>>>>>> match any \n that have space
after and {2,} is instructing
> >>> the
> >>> > >>>>>>>> regex to
> >>> > >>>>>>>>>> match 2 or more occurrence
of such pattern (\n).
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>> Please kindly let me know
what is wrong and how should I
> do
> >>> it?
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>> I am using Solr 7.6.0.
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>> Regards,
> >>> > >>>>>>>>>> Edwin
> >>> > >>>>>>>>>>
> >>> > >>>>>>>>>
> >>> > >>>>>>>>
> >>> > >>>>>>>
> >>> > >>
> >>> >
> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message