Return-Path: X-Original-To: apmail-pdfbox-users-archive@www.apache.org Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 858BE189DC for ; Fri, 11 Mar 2016 06:32:30 +0000 (UTC) Received: (qmail 92121 invoked by uid 500); 11 Mar 2016 06:32:30 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 92096 invoked by uid 500); 11 Mar 2016 06:32:30 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 92084 invoked by uid 99); 11 Mar 2016 06:32:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Mar 2016 06:32:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 7743C1A10DB for ; Fri, 11 Mar 2016 06:32:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id SeaCYKYx-KwW for ; Fri, 11 Mar 2016 06:32:28 +0000 (UTC) Received: from mail-wm0-f49.google.com (mail-wm0-f49.google.com [74.125.82.49]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 9910C5F2C3 for ; Fri, 11 Mar 2016 06:32:27 +0000 (UTC) Received: by mail-wm0-f49.google.com with SMTP id p65so5082864wmp.1 for ; Thu, 10 Mar 2016 22:32:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=GoNHCwv/sDc7vh6fhJob0F11T2I1gLEttN1pVfGgTLQ=; b=tOZ8HRqbJ8EdfxonBPMNI+NYYaq9yLPtFgrkTY/xW2vXKazdmXHsLlsyNbI6j/XEmU p3C1IIEENznvJgep1JH3bpamFxNZLbhbKkdL2r/oj5HJjeeOKqT4eliMSK1w8lWjmSPu OeayX2DA0cYK+uaHvJwNiDhis9emg2K8cdVNTwkLMtuSeqC9TEWnuLZ66H9h443b7PPM EOGZtxuH+I3PCdxvN9MTWHbmvUTV9ZDa2GE+2h8EmDg502T1FqanYSpn782wEuLOBFzS wwjH2y9pNdz+/l4gnPh+FUm+wR/gf72G94z4pQcgOBIbrzpNWaR9LmhYNdui49AqVElE C8Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=GoNHCwv/sDc7vh6fhJob0F11T2I1gLEttN1pVfGgTLQ=; b=d++nl+p3VVo5vn2FmUuNNqoHSnZXvBXuBJSQ8zwO3EJ3yd6+Ni6bEd2AK2Oj3++baS dKT23PiwseVgB71gK0ChKre/RHwqy3kEqMWRLfULIDEw/uC/qAS8+oN+ue1pkF/q4yYg RS9zxA6jjH3DnZ59QI47kXQQ0UYsfTtgVeGw5ck7qSBod6ptO7481WX/wtW4vUfEOkeF u4KV2EplPK2hoTbWFallNsI6D/YRX4nhp0cVOMYGTfsdpfKtrVenItA4wYwvEthewePG 8eqMT7l9L11ew2ToKA//FyufqjHQol7uV90h1E+W63XE23xNGOPzyMnUTLKpEw9Apgb6 pDoA== X-Gm-Message-State: AD7BkJK1FNCMNAigJyEw/i/tGxCb7ofPTWXiQcLqn5gR4BA6My2bAiOqrIedUOMSAa0IFkcXajx0Ba/maHdEtQ== X-Received: by 10.194.92.174 with SMTP id cn14mr8936232wjb.66.1457677947276; Thu, 10 Mar 2016 22:32:27 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.185.201 with HTTP; Thu, 10 Mar 2016 22:32:07 -0800 (PST) In-Reply-To: <56E2598B.9060506@t-online.de> References: <1457668245179.72912@concordia.ca> <56E2598B.9060506@t-online.de> From: Muhammad Ismail Date: Fri, 11 Mar 2016 11:32:07 +0500 Message-ID: Subject: Re: Look for text in pdf file and then extract it To: users@pdfbox.apache.org Content-Type: multipart/alternative; boundary=047d7bfcedfc3e9317052dc018a4 --047d7bfcedfc3e9317052dc018a4 Content-Type: text/plain; charset=UTF-8 Try extract text from PDF & created search index from that text & do your desired searching. On Fri, Mar 11, 2016 at 10:37 AM, Tilman Hausherr wrote: > Am 11.03.2016 um 04:50 schrieb Najib Sahyoun: > >> ?Hello, >> >> >> I am Najib Sahyoun, PhD student in accounting. >> >> >> I am looking for an application that looks for a specific term (i.e. >> board of directors) and then will extract all the sentences that include >> the term (board of directors). >> >> >> Does your application perform this? >> > > No. You'll have to develop this on top of the text extraction or hire > someone to do it. PDFBox just extract the text. > > Tilman > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org > For additional commands, e-mail: users-help@pdfbox.apache.org > > -- Thanks Muhammad Ismail cell (PAK) : +92.322.5100362 cell (Sweden): +46 700-321-521 e-mail: it.is.ismail@gmail.com This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. --047d7bfcedfc3e9317052dc018a4--