Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0E81C200B4B for ; Thu, 7 Jul 2016 03:19:48 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0D331160A73; Thu, 7 Jul 2016 01:19:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2D756160A64 for ; Thu, 7 Jul 2016 03:19:47 +0200 (CEST) Received: (qmail 65983 invoked by uid 500); 7 Jul 2016 01:19:46 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 65971 invoked by uid 99); 7 Jul 2016 01:19:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jul 2016 01:19:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 5B9FC1A08F4 for ; Thu, 7 Jul 2016 01:19:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.002 X-Spam-Level: X-Spam-Status: No, score=-0.002 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id J__dp5xByRH0 for ; Thu, 7 Jul 2016 01:19:43 +0000 (UTC) Received: from mail-io0-f169.google.com (mail-io0-f169.google.com [209.85.223.169]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 9569C5F4E5 for ; Thu, 7 Jul 2016 01:19:42 +0000 (UTC) Received: by mail-io0-f169.google.com with SMTP id i186so8966675iof.1 for ; Wed, 06 Jul 2016 18:19:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=2aXJSLOFuPXmVq9hm/s4Mgrxjhpem2MydEyuxP9Hr3I=; b=KSpgZCyoHZ+MnZmW2seg4trn/4pa/16W5eLHIc/p1GFmcGweRY4y3fPmRr7ufpOPOo GKdecSbIBMUFUY0tgkqQQTbwuFOBDtcg5Ip4m0YKmOkzK1l7Tov4jhTbj7+zOfNdkp2c 0JpCeJlbgDTqkC7L4bPcJ08NOo3NZeKfYMXIijiBk4m5Nsh4ZZ/nng6/iLBrHXxSeFqT iaG2ZQJCDDYM4FVFnITWSY2XFGBK6HrGVCE1P1jt8xkkqPZpSxA/UZ/3pJzzTjJCHJwi Fl3Jp3h0NZ4+rBaqU7RTS9v4EL2PnnpoFfUsn0Tq5mAPr/bMzEOJfLR+m8Zulj38mjg5 HiIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=2aXJSLOFuPXmVq9hm/s4Mgrxjhpem2MydEyuxP9Hr3I=; b=OAv4lsZ/tlP8BdGAxrYDV0tW+EPN8EGXwztDDrq4YuILNupYurc/zeAmYo6FLmElYJ CjMAAk/5AuiHXDakCO5NVqR9W7tAnQUKurMxKRD91ZhKaaUn6dqiqEeTl8nAy1lhEsxl P3oSWS8iSQ8j8O3VkBkAA0GAQUoMh+kEEpDQCBPJr3w9PfSQwRmfTlhd1nvPq36I1EXY e6cPFL1CHozd4jMNzrEOHBo7MuCmBTg7nVbj5Y3phoyHh/xyVEuR3Zec6I+8Sjd7zVUT CjNPbn4emv488zz6ZHJOMk9DWxp/vrRh1iiyhHMSVfjFxR1DXzBZZSoMezDyv+krDRDI wMVQ== X-Gm-Message-State: ALyK8tJVIbPAAjqtw0JbmgDbSVIlvP0LGWqy32vjjZh1qHnCURNpwKZw3bPHMeJdDfFGWHjig506BUh6cRFDXQ== X-Received: by 10.107.29.142 with SMTP id d136mr19409iod.50.1467854381253; Wed, 06 Jul 2016 18:19:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.150.5 with HTTP; Wed, 6 Jul 2016 18:19:01 -0700 (PDT) In-Reply-To: References: From: Erick Erickson Date: Wed, 6 Jul 2016 18:19:01 -0700 Message-ID: Subject: Re: dv field is too large To: java-user Content-Type: text/plain; charset=UTF-8 archived-at: Thu, 07 Jul 2016 01:19:48 -0000 Well, if you must sort on a 32K single value (although I think this is extremely silly, _nobody_ will notice that two docs are out of order because they were identical up until the 30,000th character but the 30,001st character isn't sorted correctly), do as Mike suggests and chop it off before sending it to Lucene. Best, Erick On Wed, Jul 6, 2016 at 3:53 PM, Sheng wrote: > You misunderstand. I have many fields, and unfortunately a few of them are > quite big, i.e. exceeding the 32k limit. In order to make these "big" > fields sortable, they have to be stored as SortedDocValueField. Or that is > wrong, one can actually sort the search result by a "big" field without > indexing it to a SortedDocValueField. Suggestion ? > > On Wednesday, July 6, 2016, Erick Erickson wrote: > >> bq: In this case, we >> have to index a particular data structure which has bunch of fields and >> each of them is promised to be searchable and search-sortable to the user >> >> If I'm reading this right, you have some structure. You say >> "each of them is promised to be searchable and search-sortable" >> >> It _sounds_ like what you want to do is break these fields out >> into separate fields each of which is searchable and sortable >> independently. But from what you've described, putting the entire >> thing into a single DV field isn't useful. >> >> Best, >> Erick >> >> >> >> On Wed, Jul 6, 2016 at 3:10 PM, Sheng > >> wrote: >> > To be clear, the "field" is indeed tokenized, which is accompanied with a >> > SortedDocValueField so that it is sortable too. Am I making the wrong >> > assumption here ? >> > >> > On Wednesday, July 6, 2016, Sheng > >> wrote: >> > >> >> Hi Eric, >> >> >> >> I am refactoring a legacy system. One of the most annoying things is I >> >> have to keep the old feature even though it makes little sense. In this >> >> case, we have to index a particular data structure which has bunch of >> >> fields and each of them is promised to be searchable and >> search-sortable to >> >> the user. Turns out one field is notoriously large. I think the old >> >> implementation uses some quite clumsy way to make it happen. But since >> we >> >> decide to refactor the system with all the goodies from Lucene, we want >> to >> >> do the sorting right, and here we are at this issue... :-( >> >> >> >> On Wednesday, July 6, 2016, Erick Erickson > >> >> ');>> >> wrote: >> >> >> >>> Is this an "XY" problem? Meaning, why do you need DV fields larger than >> >>> 32K? >> >>> >> >>> You can't search it as text as it's not tokenized. Faceting and sorting >> >>> by a 32K >> >>> field doesn't seem very useful. You may have a perfectly valid reason, >> >>> but it's >> >>> not obvious what use-case you're serving from this thread so far.... >> >>> >> >>> Nobody has yet put forth a compelling use-case for such large fields, >> >>> perhaps >> >>> this would be one. >> >>> >> >>> Best, >> >>> Erick >> >>> >> >>> On Wed, Jul 6, 2016 at 2:24 PM, Sheng > > wrote: >> >>> > Mike - Thanks for the prompt response. Is there a way to bypass this >> >>> > constraint for SortedDocValueField ? Or we have to live with it, >> >>> meaning no >> >>> > fix even in future release? >> >>> > >> >>> > On Wednesday, July 6, 2016, Michael McCandless < >> >>> lucene@mikemccandless.com > >> >>> > wrote: >> >>> > >> >>> >> I believe only binary DVs can be larger than 32K bytes. >> >>> >> >> >>> >> Mike McCandless >> >>> >> >> >>> >> http://blog.mikemccandless.com >> >>> >> >> >>> >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng > >> >>> > >> >>> >> wrote: >> >>> >> >> >>> >> > Hi, >> >>> >> > >> >>> >> > I am getting an IAE indicating one of the SortedDocValueField is >> too >> >>> >> large, >> >>> >> > > 32k >> >>> >> > >> >>> >> > I googled a bit, and it seems like #Lucene-4583 has addressed this >> >>> issue >> >>> >> in >> >>> >> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or >> >>> >> > misunderstand anything ? >> >>> >> > >> >>> >> > Thanks, >> >>> >> > >> >>> >> >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> >> >>> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >>> >> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org