Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F9C617764 for ; Tue, 14 Oct 2014 05:30:03 +0000 (UTC) Received: (qmail 62176 invoked by uid 500); 14 Oct 2014 05:30:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 62106 invoked by uid 500); 14 Oct 2014 05:30:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 62095 invoked by uid 99); 14 Oct 2014 05:30:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Oct 2014 05:30:01 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of trejkaz@trypticon.org designates 209.85.220.175 as permitted sender) Received: from [209.85.220.175] (HELO mail-vc0-f175.google.com) (209.85.220.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Oct 2014 05:29:35 +0000 Received: by mail-vc0-f175.google.com with SMTP id id10so6820552vcb.6 for ; Mon, 13 Oct 2014 22:29:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=xk/tbhGKw/lHN/nNO8Dr/xxO47eJTbjXI26IROE7mpQ=; b=DMykIFG1CTVlwG1rTny2EWNJi+FM49tlM0+EIgzAfvVWFVzQQDOVZcOrXiez+ylUuK kG3Xn1DyO+trTIdm82XZcckMeMw/SMG3QXMqXCJc1s2xAwgJ5QbggXzqyfFisjkWP3P+ 83YHB5G1xNiTmJXMIHC6Bdg2gqs8gCPjgB2Lo2omJUXLNtM/mgcWL1CJV3t0neCCLzOl hTIVKYvi3a4CGcQKs6IVTGbGNmOqxIMDW0Wbeo0FHRw9SqmtRlD/HzNVXtkdK9uKZUQW kI9k+tXJkl64sBJNWLStpM6RAoL9f84WA5JZjwzg+XQSN/miOz5fMJ0qxnbAipUYrO6v 9+DQ== X-Gm-Message-State: ALoCoQlJQjHdeoxq77ttx5tJO/HuYlwk9MJ+a0Aw3Phl/LPlU0Y9HAhbhevucdDubjnMPUURDZR0 X-Received: by 10.52.144.38 with SMTP id sj6mr2365811vdb.56.1413264573761; Mon, 13 Oct 2014 22:29:33 -0700 (PDT) Received: from mail-vc0-f174.google.com (mail-vc0-f174.google.com [209.85.220.174]) by mx.google.com with ESMTPSA id ex1sm3842951vdc.23.2014.10.13.22.29.33 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 13 Oct 2014 22:29:33 -0700 (PDT) Received: by mail-vc0-f174.google.com with SMTP id hq12so6951648vcb.33 for ; Mon, 13 Oct 2014 22:29:32 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.220.10.66 with SMTP id o2mr2701592vco.31.1413264572739; Mon, 13 Oct 2014 22:29:32 -0700 (PDT) Received: by 10.220.172.195 with HTTP; Mon, 13 Oct 2014 22:29:32 -0700 (PDT) In-Reply-To: References: Date: Tue, 14 Oct 2014 16:29:32 +1100 Message-ID: Subject: Re: ArrayIndexOutOfBoundsException: -65536 From: Trejkaz To: Lucene Users Mailing List Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org Bit of thread necromancy here, but I figured it was relevant because we get exactly the same error. On Thu, Jan 19, 2012 at 12:47 AM, Michael McCandless wrote: > Hmm, are you certain your RAM buffer is 3 MB? > > Is it possible you are indexing an absurdly enormous document...? We're seeing a case here where the document certainly could qualify as "absurdly enormous". The doc itself is 2GB in size and the tokenisation is per-character, not per-word, so the number of generated terms must be enormous. Probably enough to fill 2GB... So I'm wondering if there is more info somewhere on why this is (or was? We're still using 3.6.x) a limit and whether it can be detected up-front. Some large amount of indexing time (~30 minutes) could be avoided if we can detect that it would have failed ahead of time. TX --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org