From users-return-11566-archive-asf-public=cust-asf.ponee.io@pdfbox.apache.org Mon Mar 18 10:13:34 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 73A22180651 for ; Mon, 18 Mar 2019 11:13:34 +0100 (CET) Received: (qmail 4087 invoked by uid 500); 18 Mar 2019 10:13:33 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 4075 invoked by uid 99); 18 Mar 2019 10:13:32 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Mar 2019 10:13:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 3D5B8182F76 for ; Mon, 18 Mar 2019 10:13:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.807 X-Spam-Level: * X-Spam-Status: No, score=1.807 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id T_EGPRkSC8uY for ; Mon, 18 Mar 2019 10:13:31 +0000 (UTC) Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com [209.85.128.44]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 9DE09624B7 for ; Mon, 18 Mar 2019 10:13:30 +0000 (UTC) Received: by mail-wm1-f44.google.com with SMTP id z11so3828696wmi.0 for ; Mon, 18 Mar 2019 03:13:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:mime-version:thread-index :content-language; bh=jAmrY7f33Uy2M22qAfog+WQrAUyb90aDbSq6SahINGE=; b=Cc8EcHk8r4xT2G8176d0/Ww1fBKZdTYuHWVVhDP4BA8Q9FTC77tlwFTxScyeF0vDbn Vnn4+gg3GVXypvzIn37uU4U9Ijd95jh84bL0hIuwq7Og9ETlOw/guOyebuBz4SO0KCpo kcbJZ7zdavbxGJJeG/QdaU4DxP992Il5qr+DlmR9piv4DCcyi82L7u0GmJrRO605YM75 tiZ45smi7TDxcrS732AJ3xHxvfILBY+jcV1QWuOym0fOoHCnDmNTMo2IfG/ZWvr+Hun8 /uA6oUVx4dzzaS4Sja9l8mHL0/E+WlybTXbihpVxLRkfC8VX36av6BI3llfdolSel3uP DqBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :thread-index:content-language; bh=jAmrY7f33Uy2M22qAfog+WQrAUyb90aDbSq6SahINGE=; b=nmen6h4S4NbAECtiFX/ktq4/3KRBWC8AUFsBU0TI18vgtrhuSVPOPq/XGyReZG0Hof 2hxD/gYVRqFtvIcImrqFKI/JnbYrKjTA5cgtIpmTerlhUYsQKl9HmJx/hx/YvtJSbUAj iHYoCYLC6EJHapmtATunrMDXMAGJ3AEvmZ4PpucJceAU1FnFGf/2WjHBr/YCcfYFunFY sb2CxT44ML3s0i+Gf3q0WZWMN0mMewQMoYaN3b4TboSm6MRf7J7jNS589gBzPzb170Yf OMydtcM5lO0xtMwNtcRe3Qd0wxQBbO4773SYmPTa3OCxz3Wx93Gi4lBWL5UpTW3xsv+a ejZQ== X-Gm-Message-State: APjAAAWR/m1JW7sYHZuAE580LFx7ln8d+W1jGnhETl9I5KfPqIzmAD+n KSvL/Cn8cK/GtQv7S1+s2ZbpwHYj X-Google-Smtp-Source: APXvYqxP4fmMvcgZ9b0U7iKqjB7laaGRTImoJK3W8zfI8t8HuyCodl6XCkGcysHPtO/IgBOItiCw1w== X-Received: by 2002:a1c:2e85:: with SMTP id u127mr10302928wmu.70.1552904003759; Mon, 18 Mar 2019 03:13:23 -0700 (PDT) Received: from LAPTOP7UEFS44D ([156.213.9.81]) by smtp.gmail.com with ESMTPSA id s187sm16364740wms.7.2019.03.18.03.13.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Mar 2019 03:13:22 -0700 (PDT) From: "Hesham Gneady" To: Subject: Extract bold text from a PDF file Date: Mon, 18 Mar 2019 12:13:21 +0200 Message-ID: <5d2d01d4dd73$38af0a00$aa0d1e00$@gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_5D2E_01D4DD83.FC3B5C70" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdTdcuVW7wbIrn++QVuDSxmo4+KV7w== Content-Language: en-us X-Antivirus: Avast (VPS 190317-4, 03/17/2019), Outbound message X-Antivirus-Status: Clean ------=_NextPart_000_5D2E_01D4DD83.FC3B5C70 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hello , I am trying to extract the bold text for some PDF files, but some fail like this one: https://www.dropbox.com/s/gh2zwdh3sl3isck/Bold%20Font%20Sample.pdf?dl=0 I am overriding the processTextPosition (.) method to do this, and i have tried all these options, but none has worked for me: 1. if( text.getFont().getFontDescriptor().getFontName().toLowerCase().contains( "bold" ) ) {.} // returns false. 2. if( text.getFont().getName().toLowerCase().contains( "bold" ) {.} // returns false. 3. System.out.println( text.getFont().getFontDescriptor().getFontWeight() ); // returns 0.0. 4. System.out.println( getGraphicsState().getLineWidth() ); // returns 1.0. 5. System.out.println( getGraphicsState().getTextState().getRenderingMode() ); // returns FILL Note: The font name for the bold text in the PDF file is "frutigernextlt-heavycn". It has the word "heavy". I could detect it this way, but I think this is not a right procedure, as I have other PDF files with font names that have the "heavy" word while they're not bold. Best regards, Hesham --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ------=_NextPart_000_5D2E_01D4DD83.FC3B5C70--