Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 53629 invoked from network); 5 Jan 2010 00:35:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Jan 2010 00:35:51 -0000 Received: (qmail 38882 invoked by uid 500); 5 Jan 2010 00:35:49 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 38808 invoked by uid 500); 5 Jan 2010 00:35:48 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 38798 invoked by uid 99); 5 Jan 2010 00:35:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jan 2010 00:35:48 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [198.147.195.5] (HELO flicker.bmc.com) (198.147.195.5) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 05 Jan 2010 00:35:40 +0000 Received: by flicker.bmc.com (Postfix, from userid 13749) id 272CF1E9F83; Mon, 4 Jan 2010 17:35:47 -0700 (MST) Received: from PHX-HTPRD-02.adprod.bmc.com (phx-htprd-02.adprod.bmc.com [172.24.32.8]) by flicker.bmc.com (Postfix) with ESMTP id 1E21C1E9F7A for ; Mon, 4 Jan 2010 17:35:47 -0700 (MST) Received: from PHXCCRPRD02.adprod.bmc.com ([172.24.32.181]) by PHX-HTPRD-02.adprod.bmc.com ([172.24.32.8]) with mapi; Mon, 4 Jan 2010 18:34:48 -0600 From: "Baldwin, David" To: "java-user@lucene.apache.org" Date: Mon, 4 Jan 2010 18:34:46 -0600 Subject: much memory overhead does Tika generally require Thread-Topic: much memory overhead does Tika generally require Thread-Index: AcqNnuGrmKLJzXbbTTa1vNUuVdY56A== Message-ID: <0CC1676F11AD0249A6A2FC201AADFC114A346CE15F@PHXCCRPRD02.adprod.bmc.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_0CC1676F11AD0249A6A2FC201AADFC114A346CE15FPHXCCRPRD02ad_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_0CC1676F11AD0249A6A2FC201AADFC114A346CE15FPHXCCRPRD02ad_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I need to get a handle on how much memory Tika needs to token-ize different= =3D file types. In other words, I need to find information on required ov= erhe=3D ad (including copies of buffers made if applicable) so that I can p= roduce s=3D ome kind of guidelines for memory possibly needed by users of t= he product I=3D am working on which uses lucene/tika. Now I realize that there is a lot of context that can be provided, I want t= =3D o find out first, if anyone knows of already existing data/metrics on t= his. --_000_0CC1676F11AD0249A6A2FC201AADFC114A346CE15FPHXCCRPRD02ad_--