pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From srinath prathi <srinath.pra...@gmail.com>
Subject Re: Strange character
Date Mon, 02 Nov 2015 08:46:11 GMT
Thank you. Could do it. I just used String.replaceall(). Just that in java
I need to pass the regex as "\uF0B7".
I think internally it does the same as your algo.
On 2 Nov 2015 13:18, "John Hewson" <john@jahewson.com> wrote:

> Iterate over each character and check if Character.UnicodeBlock.of(char) is
> equal to Character.UnicodeBlock.PRIVATE_USE_AREA. If so, omit the
> character.
>
> — John
>
> > On 1 Nov 2015, at 21:10, srinath prathi <srinath.prathi@gmail.com>
> wrote:
> >
> > Thank you for the information. How to remove it? When I replaced it with
> > "", it is not working. I want it to be removed. Can you please help me in
> > it?
> >
> >
> >
> >
> > Yours Sincerely
> > Srinath
> >
> > On Mon, Nov 2, 2015 at 12:23 AM, John Hewson <john@jahewson.com> wrote:
> >
> >> Indeed it is. The character which you’ve pasted in the e-mail below is
> >> U+F0B7,
> >> which is a private use code point:
> >>
> >> https://codepoints.net/U+F0B7?lang=en <
> >> https://codepoints.net/U+F0B7?lang=en>
> >>
> >> This means that the PDF contains some private text encoding which, while
> >> you
> >> can recognise the characters on the screen, doesn’t correspond to any
> >> usable
> >> text as far as the encoded content goes. This is not uncommon for PDF.
> >>
> >> — John
> >>
> >>
> >>> On 1 Nov 2015, at 02:39, Olaf Drümmer <olaflist@callassoftware.com>
> >> wrote:
> >>>
> >>> Hi Srinath,
> >>>
> >>> this is the so called “.notdef" replacement glyph you typically get
> when
> >> rendering text with a font, where that font does not contain the glyph
> >> needed to render a given character.
> >>>
> >>> Olaf
> >>>
> >>>
> >>>> On 01.11.2015, at 10:23, srinath prathi <srinath.prathi@gmail.com>
> >> wrote:
> >>>>
> >>>> Dear All
> >>>> What is this character  ? i get that while stripping the a pdf. How
> to
> >>>> treat it?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Yours Sincerely
> >>>> Srinath
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >>> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>>
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message