Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 89C09200C26 for ; Sat, 25 Feb 2017 18:20:33 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 8850B160B5D; Sat, 25 Feb 2017 17:20:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D3E2D160B50 for ; Sat, 25 Feb 2017 18:20:32 +0100 (CET) Received: (qmail 91249 invoked by uid 500); 25 Feb 2017 17:20:31 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 91236 invoked by uid 99); 25 Feb 2017 17:20:31 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Feb 2017 17:20:31 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 367771A0011 for ; Sat, 25 Feb 2017 17:20:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.02 X-Spam-Level: X-Spam-Status: No, score=-0.02 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=form-runner-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id FLjxgUkF1-RF for ; Sat, 25 Feb 2017 17:20:28 +0000 (UTC) Received: from mail-pg0-f50.google.com (mail-pg0-f50.google.com [74.125.83.50]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 150545F24C for ; Sat, 25 Feb 2017 17:20:27 +0000 (UTC) Received: by mail-pg0-f50.google.com with SMTP id p5so1413436pga.1 for ; Sat, 25 Feb 2017 09:20:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=form-runner-com.20150623.gappssmtp.com; s=20150623; h=from:content-transfer-encoding:subject:message-id:date:to :mime-version; bh=c2fSnXrbsSnBV8E5g5Elq5RjMTlnOSEcrxwTs9nutSU=; b=wV4+VoeiAtaDxrfvnuJ/OseqUtFPJb/WfruH4SxQLH1j9Y6yWqSMmJJC0VymzzCADh OD1VUC7wSvXFbPCdqIDIx8zR3/Fjt+/wiovpoa393QFGtLqcq/+E6LRDEbcQ1+b3lP0v jBeUXuZ0i/jazaMaeB/xIANjFqu8OauPB7Qwwkm3uxJ9f8aTfrqsSOyrhyOY9Q05XygW PXx5GzjTkSNfesaY0VdFijufK/IEpJgfew+2i9asvhGUrs20pioVVJ+kYoxKjgAMta2a mry0BXBJ2/UfpVwH+vSpSqUTgvA2PI41FgB0PFVlUscMaor4mQ3hnbznUnUVY6LAwiz3 1upQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:subject :message-id:date:to:mime-version; bh=c2fSnXrbsSnBV8E5g5Elq5RjMTlnOSEcrxwTs9nutSU=; b=dYJKippGvO/sxwpo7qtw+QkEF4n3l1Bk7f72eIj5RA5EZTGD02IxB1J17DV1ZvqKUH Uk/d4nuIfQffg95E7K1Jg9p/+MYexI72r2xXAA+Ui6e7yMpSMV0nx0qgTbRlh1kHTRfs k0jm9PV8vZxyl+q78V/WFE/xRDfOKwyerdKF7HR3W8J6EtoZkwKj5u4AxdQ0R6CUfxaP OCEY3YAla5hFg/XhBKChtpnTWDgeyray1X2j1aizTwDNabTV4TtqpmRfemvFREyBB93v 7kuizVpHR8hsnydYwLjW69JooNtpPFu6jnpxd5xH1HZmD4UzSlYvofOzwQsnSvaVGWVb FgUQ== X-Gm-Message-State: AMke39nvUMsbknfGGF2LD9dFbR1IULqBQMfHBAMacAsPqmPWD70NXeBnNJTOH3xJm0JaXg== X-Received: by 10.99.51.76 with SMTP id z73mr11037005pgz.137.1488043220923; Sat, 25 Feb 2017 09:20:20 -0800 (PST) Received: from ?IPv6:2600:1011:b01e:3b5e:5971:6793:5848:a493? ([2600:1011:b01e:3b5e:5971:6793:5848:a493]) by smtp.gmail.com with ESMTPSA id s13sm21590332pfk.26.2017.02.25.09.20.19 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 25 Feb 2017 09:20:20 -0800 (PST) From: Ken Bowen Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Text stripper and table extraction Message-Id: <2A2F581F-D514-4F41-94E3-CAC5D4F1D9CD@form-runner.com> Date: Sat, 25 Feb 2017 10:20:25 -0700 To: users@pdfbox.apache.org Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) X-Mailer: Apple Mail (2.3124) archived-at: Sat, 25 Feb 2017 17:20:33 -0000 Hello all, This pdfbox-based open source project appeared on HN = (https://news.ycombinator.com/news) this morning: https://github.com/JonathanLink/PDFLayoutTextStripper I haven=E2=80=99t tried it out, but given the number of queries = concerning both textstripping and extraction from tables that have = appeared here, I thought it was worth mentioning in case it was missed. Regards, Ken Bowen= --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org For additional commands, e-mail: users-help@pdfbox.apache.org