Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 86240 invoked from network); 8 Apr 2011 14:44:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Apr 2011 14:44:43 -0000 Received: (qmail 83121 invoked by uid 500); 8 Apr 2011 14:44:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 83073 invoked by uid 500); 8 Apr 2011 14:44:40 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 83065 invoked by uid 99); 8 Apr 2011 14:44:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 14:44:40 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of chris.bamford@talktalk.net designates 62.24.128.243 as permitted sender) Received: from [62.24.128.243] (HELO out1.ip07ir2.opaltelecom.net) (62.24.128.243) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 14:44:30 +0000 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArAHAGMen03O3uJ3/2dsb2JhbACZAIwsgWDBRoVtBA X-IronPort-AV: E=Sophos;i="4.63,323,1299456000"; d="scan'208";a="485711182" Received: from frr3-l27.sis.aol.com (HELO FRR3-L27) ([206.222.226.119]) by out1.ip07ir2.opaltelecom.net with ESMTP; 08 Apr 2011 15:44:09 +0100 To: java-user@lucene.apache.org Subject: Rewriting an index without losing 'hidden' data Date: Fri, 08 Apr 2011 10:44:09 -0400 X-MB-Message-Source: WebUI X-MB-Message-Type: User MIME-Version: 1.0 From: Chris Bamford Content-Type: multipart/alternative; boundary="--------MB_8CDC4128DEA8B31_AF8_1E88E_FRR3-L27.sysops.aol.com" X-Mailer: Webmail 33490-STANDARD Received: from 135.196.24.203 by FRR3-L27.sysops.aol.com (206.222.226.119) with HTTP (WebMailUI); Fri, 08 Apr 2011 10:44:09 -0400 Message-Id: <8CDC4128DA32269-AF8-8189@FRR3-L27.sysops.aol.com> X-Virus-Checked: Checked by ClamAV on apache.org ----------MB_8CDC4128DEA8B31_AF8_1E88E_FRR3-L27.sysops.aol.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" Hi,=20=0A=0AI=20recently=20discovered=20that=20I=20need=20to=20add=20a=20single=20= field=20to=20every=20document=20in=20an=20existing=20(very=20large)=20index.=20=20= Reindexing=20from=20scratch=20is=20not=20an=20option=20I=20want=20to=20consider=20= right=20now,=20so=20I=20wrote=20a=20utility=20to=20add=20the=20field=20by=20rewr= iting=20the=20index=20-=20but=20this=20seemed=20to=20lose=20some=20of=20the=20fi= elds=20(indexed,=20but=20not=20stored?).=20=20In=20fact,=20it=20shrunk=20a=2012G= b=20index=20down=20to=204.2Gb=20-=20clearly=20not=20what=20I=20wanted.=20=20:-)=0A= What=20am=20I=20doing=20wrong?=0A=0AMy=20technique=20was:=0A=0A=20=20Analyzer=20= analyser=20=3D=20new=20StandardAnalyzer();=0A=20=20IndexSearcher=20searcher=20=3D= =20new=20IndexSearcher(indexPath);=0A=20=20IndexWriter=20indexWriter=20=3D=20new= =20IndexWriter(indexPath,=20analyser);=0A=20=20Hits=20hits=20=3D=20matchAllDocum= entsFromIndex(searcher);=0A=0A=20=20for=20(int=20i=3D0;=20i=20<=20hits.length();= =20i++)=20{=0A=20=20=20=20=20=20=20=20=20=20Document=20doc=20=3D=20hits.doc(i);=0A= =20=20=20=20=20=20=20=20=20=20String=20id=20=3D=20doc.get("unique-id");=0A=20=20= =20=20=20=20=20=20=20=20doc.add(new=20Field("newField",=20newValue,=20Field.Stor= e.YES,=20Field.Index.UN_TOKENIZED));=0A=20=20=20=20=20=20=20=20=20=20indexWriter= .updateDocument(new=20Term("unique-id",=20id),=20doc);=0A=20=20}=0A=0A=20=20sear= cher.close();=0A=20=20indexWriter.optimize();=20=0A=20=20indexWriter.close();=0A= =0ANote=20that=20my=20matchAllDocumentsFromIndex()=20does=20get=20the=20right=20= number=20of=20hits=20from=20the=20index=20-=20i.e.=20the=20same=20number=20as=20= held=20in=20the=20index.=0A=0A=0A=20Thanks=20for=20any=20ideas!=0ABTW=20I=20am=20= using=20Lucene=202.3.2.=0A=0A-=20Chris=0A=0A=20=0A=0A=0A ----------MB_8CDC4128DEA8B31_AF8_1E88E_FRR3-L27.sysops.aol.com--