Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A088C200B78 for ; Fri, 19 Aug 2016 05:51:28 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9F120160AB7; Fri, 19 Aug 2016 03:51:28 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E4C3D160AAE for ; Fri, 19 Aug 2016 05:51:27 +0200 (CEST) Received: (qmail 83392 invoked by uid 500); 19 Aug 2016 03:51:26 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 83380 invoked by uid 99); 19 Aug 2016 03:51:26 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Aug 2016 03:51:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 06C27C034D for ; Fri, 19 Aug 2016 03:51:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.821 X-Spam-Level: X-Spam-Status: No, score=-0.821 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 4pLdfMtzk5fW for ; Fri, 19 Aug 2016 03:51:23 +0000 (UTC) Received: from mail-it0-f50.google.com (mail-it0-f50.google.com [209.85.214.50]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 163675FAF4 for ; Fri, 19 Aug 2016 03:51:23 +0000 (UTC) Received: by mail-it0-f50.google.com with SMTP id n128so15179068ith.1 for ; Thu, 18 Aug 2016 20:51:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=jPrmCFRc+nuwFRInXERtwCAPwL0qFGWg51C2FDL6jkw=; b=rkz4c2SQuyOuJkl3FElXtCIFuaAM7D7gkomwW0w6TISvov43U0VXd2IVBJutSQJF2l SQ4ifyJMKHxvWeF1a0b2tKLFPDlKWZ4VT05VFwkFD0yPxfVIRHkUJJ6/GW3Cy8REigGk U9r9a7XcuKWeCSlsJ0uxf/ic4SGrYfj0+aUXcHPGkPeJsWv+HOv2j5Hi+PA59yNTnXPr HFeGaJ9cXx/7vyN/YwQr5+WAvtMQ3gN145bfN5k1nu+T+y0sxTOlnknwIwJ5NgjEt/Fq nrTmm6fUhwU7j5/fuiWQsQghlVlG/lXnd3Dw6bC+Pujvb+HnpQPm4CRc/ABjEs3S1d0Y JD8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=jPrmCFRc+nuwFRInXERtwCAPwL0qFGWg51C2FDL6jkw=; b=FIDOcmQev5if82AZ6B2iF4fegN3K8eoR8a2oa03YBR4zim+vuKr4S58gTVyPVG5OPh DJ9DuuorQg1KUEhqLR9iRAaxrLmN2DfBTtIHJxUQ0+XHUznZ+uq+zpVl0Ez+7YffEYWp xbx27jz2WTGKRrQBwbYXe+QYSBeapHjPvpKQDqVcHI73VLDj3J4r0gWauOV6OrP5evyC h4E0w0XzFZqK3Hnw8Nbbg+9o2kiOxRoVZcd1zg8NUlvCG/7NA9WxmpJPot+j0MzyKHp7 N+cnkxqaRuWCYdoOvmlLRluzpjxIOibNH0cIdhCqQPggH5hRqe3gn6L9jp8C0eansLL3 jqOg== X-Gm-Message-State: AEkoouvEb01fNrAGx1m9RWb40NRpOjIYzPFtcXFFpoLMGf7FkpJQ7ppq6FPsq2rM8xHLU83O4FsUpnD1AS4bRQ== X-Received: by 10.36.209.196 with SMTP id w187mr3788248itg.47.1471578681717; Thu, 18 Aug 2016 20:51:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.150.19 with HTTP; Thu, 18 Aug 2016 20:50:41 -0700 (PDT) In-Reply-To: References: From: Erick Erickson Date: Thu, 18 Aug 2016 20:50:41 -0700 Message-ID: Subject: Re: docid is just a signed int32 To: java-user Content-Type: text/plain; charset=UTF-8 archived-at: Fri, 19 Aug 2016 03:51:28 -0000 OK, I'm a little out of my league here, but I'll plow on anyway.... bq: There are use cases out there where >2^31 does make sense in a single index Ok, let's put some definition to this and define the use-case specifically rather than be vague. I've just run an experiment for instance where I had 200M docs in a single shard (very small docs) and tried to sort by a date on all of them. Performance on the order of 5 seconds. 3B is what, 75 seconds? Does the use-case involve sorting? Faceting? If so the performance will probably be poor. This would be huge surgery I believe, and there hasn't been a compelling use-case in the search world for it. Unless and until that case is made I suspect this idea will meet with a lot of resistance. That said, I do understand that this is somewhat akin to "Nobody will ever need more than 64K of ram", meaning that some limits are assumed and eventually become outmoded. But given Java's issues with memory and GC I suspect that it'll be really hard to justify the work this would take. FWIW, Erick On Thu, Aug 18, 2016 at 6:31 PM, Trejkaz wrote: > On Thu, Aug 18, 2016 at 11:55 PM, Adrien Grand wrote: >> No, IndexWriter enforces that the number of documents cannot go over >> IndexWriter.MAX_DOCS (which is a bit less than 2^31) and >> BaseCompositeReader computes the number of documents in a long variable and >> ensures it is less than 2^31, so you cannot have indexes that contain more >> than 2^31 documents. >> >> Larger collections should be written to multiple shards and use >> TopDocs.merge to merge results. > > But hang on: > * TopDocs#merge still returns a TopDocs. > * TopDocs still uses an array of ScoreDoc. > * ScoreDoc still uses an int doc ID. > > Looks like you're still screwed. > > I wish IndexReader would use long IDs too, because one IndexReader can > be across multiple shards too - it doesn't make much sense to me that > this is restricted, although "it's hard to fix in a > backwards-compatible way" is certainly a good reason. :D > > TX > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org