Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 15B8B200B4B for ; Thu, 7 Jul 2016 00:53:56 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 12A20160A73; Wed, 6 Jul 2016 22:53:56 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 33C9E160A64 for ; Thu, 7 Jul 2016 00:53:55 +0200 (CEST) Received: (qmail 81682 invoked by uid 500); 6 Jul 2016 22:53:54 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 81663 invoked by uid 99); 6 Jul 2016 22:53:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jul 2016 22:53:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 4BF1BC03B7 for ; Wed, 6 Jul 2016 22:53:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id hpzz4xCWiptc for ; Wed, 6 Jul 2016 22:53:50 +0000 (UTC) Received: from mail-it0-f50.google.com (mail-it0-f50.google.com [209.85.214.50]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 095E75F47B for ; Wed, 6 Jul 2016 22:53:50 +0000 (UTC) Received: by mail-it0-f50.google.com with SMTP id g4so78369583ith.1 for ; Wed, 06 Jul 2016 15:53:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=i4eebwqCTKlW+xRIub0qtou3BZQkEG8fH4nJOhfcRww=; b=h+bCitzIf7f1DoBzXoeZdE+qI8heW7fZ5GLblSPYuPhzhuGAEM/B3HrHjdDk0Uib2X Wku5DCGFJrBSC/CLYROIlqozv1cFlD7cKnKgoiyiO3rL1e6brfNrFjhPRk3GHDbSguNc SM1/L8cBsEDgIZd143PfLUrcfR7hm+BI56Uob2EPHu2SwvYzJ32mYDHSj3Y3D8CguQmo ECXn9g9KFYnOnfI7vZQ6Jh8dXpIfgWFjjnMU1s9KdVOr3f/AdIM9AUN2mfjhWLVRlOAM YUC8s/sGlXtQ+2ZAhm9cIZn6ws+YaK3ILv4eTjQK1/DfyEJRFgQWQwMl2YmVaExehtLL U5Jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=i4eebwqCTKlW+xRIub0qtou3BZQkEG8fH4nJOhfcRww=; b=HIL8ggwGh7gq8cb2cTDBPzvPAiK8ucoQzCfLX6tzZ5gl8nR4h4v7yp5wEmIX4fgmgl qrZAZgQ0r+Azqw4sFfjmxJaXQmHDP1Ejxhbhs1rynnnMcY6BvhyNWLT73p2A0NeS1bQ8 dICqX6gFpWDe5gz2MSPc39RCnGnoMXeIZfZn2h5EhXWg8Znm/FNZpqoxd3+JJoMO0s1F iOa91y3tGCN5Xg2KJkvUVHDlwu65Cid+E/nQLoFEe6RTarWuFBbmTtP5mg+CT5hZqA16 e356FmATDs7jlJSeUBM+/vRRaRycTjpzpZuaQuepaABxki1iSnFVqWl0m+qElADhR1QV Dj3A== X-Gm-Message-State: ALyK8tLPbPF9U3n+LHdv9ieqF9kXa6jmr23kilQUM/doRdC6dGt2aXxP9/4Xdd7ItAN4JQV8yx1payrP4g3yWQ== MIME-Version: 1.0 X-Received: by 10.36.125.70 with SMTP id b67mr20917705itc.24.1467845629112; Wed, 06 Jul 2016 15:53:49 -0700 (PDT) Received: by 10.64.148.40 with HTTP; Wed, 6 Jul 2016 15:53:49 -0700 (PDT) In-Reply-To: References: Date: Wed, 6 Jul 2016 18:53:49 -0400 Message-ID: Subject: Re: dv field is too large From: Sheng To: "java-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=001a114044da4ef6e40536ff7165 archived-at: Wed, 06 Jul 2016 22:53:56 -0000 --001a114044da4ef6e40536ff7165 Content-Type: text/plain; charset=UTF-8 You misunderstand. I have many fields, and unfortunately a few of them are quite big, i.e. exceeding the 32k limit. In order to make these "big" fields sortable, they have to be stored as SortedDocValueField. Or that is wrong, one can actually sort the search result by a "big" field without indexing it to a SortedDocValueField. Suggestion ? On Wednesday, July 6, 2016, Erick Erickson wrote: > bq: In this case, we > have to index a particular data structure which has bunch of fields and > each of them is promised to be searchable and search-sortable to the user > > If I'm reading this right, you have some structure. You say > "each of them is promised to be searchable and search-sortable" > > It _sounds_ like what you want to do is break these fields out > into separate fields each of which is searchable and sortable > independently. But from what you've described, putting the entire > thing into a single DV field isn't useful. > > Best, > Erick > > > > On Wed, Jul 6, 2016 at 3:10 PM, Sheng > > wrote: > > To be clear, the "field" is indeed tokenized, which is accompanied with a > > SortedDocValueField so that it is sortable too. Am I making the wrong > > assumption here ? > > > > On Wednesday, July 6, 2016, Sheng > > wrote: > > > >> Hi Eric, > >> > >> I am refactoring a legacy system. One of the most annoying things is I > >> have to keep the old feature even though it makes little sense. In this > >> case, we have to index a particular data structure which has bunch of > >> fields and each of them is promised to be searchable and > search-sortable to > >> the user. Turns out one field is notoriously large. I think the old > >> implementation uses some quite clumsy way to make it happen. But since > we > >> decide to refactor the system with all the goodies from Lucene, we want > to > >> do the sorting right, and here we are at this issue... :-( > >> > >> On Wednesday, July 6, 2016, Erick Erickson > >> ');>> > wrote: > >> > >>> Is this an "XY" problem? Meaning, why do you need DV fields larger than > >>> 32K? > >>> > >>> You can't search it as text as it's not tokenized. Faceting and sorting > >>> by a 32K > >>> field doesn't seem very useful. You may have a perfectly valid reason, > >>> but it's > >>> not obvious what use-case you're serving from this thread so far.... > >>> > >>> Nobody has yet put forth a compelling use-case for such large fields, > >>> perhaps > >>> this would be one. > >>> > >>> Best, > >>> Erick > >>> > >>> On Wed, Jul 6, 2016 at 2:24 PM, Sheng > wrote: > >>> > Mike - Thanks for the prompt response. Is there a way to bypass this > >>> > constraint for SortedDocValueField ? Or we have to live with it, > >>> meaning no > >>> > fix even in future release? > >>> > > >>> > On Wednesday, July 6, 2016, Michael McCandless < > >>> lucene@mikemccandless.com > > >>> > wrote: > >>> > > >>> >> I believe only binary DVs can be larger than 32K bytes. > >>> >> > >>> >> Mike McCandless > >>> >> > >>> >> http://blog.mikemccandless.com > >>> >> > >>> >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng > >>> > > >>> >> wrote: > >>> >> > >>> >> > Hi, > >>> >> > > >>> >> > I am getting an IAE indicating one of the SortedDocValueField is > too > >>> >> large, > >>> >> > > 32k > >>> >> > > >>> >> > I googled a bit, and it seems like #Lucene-4583 has addressed this > >>> issue > >>> >> in > >>> >> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or > >>> >> > misunderstand anything ? > >>> >> > > >>> >> > Thanks, > >>> >> > > >>> >> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > >>> For additional commands, e-mail: java-user-help@lucene.apache.org > > >>> > >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --001a114044da4ef6e40536ff7165--