Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 12746 invoked from network); 28 May 2009 10:28:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 May 2009 10:28:36 -0000 Received: (qmail 77929 invoked by uid 500); 28 May 2009 10:28:46 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 77896 invoked by uid 500); 28 May 2009 10:28:46 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Delivered-To: moderator for java-user@lucene.apache.org Received: (qmail 66764 invoked by uid 99); 28 May 2009 10:22:53 -0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gaurav.bond.itbhu@gmail.com designates 74.125.46.30 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=qlGFsmuBSK9m8C0buPju964YMKs9zjNz8Mwngj0ESMk=; b=mbEVikMcVtQ5BynTkyWv6VqMBFAmw6JkiBcvje26itLXsVrGSDvpjCG5JL5f1qwkCl r9iWNj9V8hBpTZc/zmunlcw3vKK1TONrOb+DXdF0YrkBCnr0lsfCixmauH+sp2CZauh6 BtnxgQXms98bp5jRhVjZz08H+hNh48/H7xkrg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=mVWC1sjuMr8KEqfA4RRYLRKm6bLijmX2VvXWg8wwjHN/v1rFt/HHDI+ic3uXWBrlFF bxRLCJXcJ/gkTZVJA7mcnDOT5Rc/TfxD06yDcJdEmkmETMJsgH/yZ8TxgMW1vLV99A0I ZriAYse5soDgi1QL/nlsc79HGNzJ3cE/WpsYw= MIME-Version: 1.0 Date: Thu, 28 May 2009 15:52:24 +0530 Message-ID: <894870ec0905280322r11a49321i52495a61e530e7d4@mail.gmail.com> Subject: Help Needed... From: Gaurav Kumar To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e64135dc0a5473046af6565c X-Virus-Checked: Checked by ClamAV on apache.org --0016e64135dc0a5473046af6565c Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi everyone, I am doing a project using Lucene where i need to index HTML files. I am using Tika to parse HTML files. But i need to index files according to their tags which means that every text present in different HTML tag (like

) should be stored in different fields. Can i do that. If yes how? Also can i assign different weightage to the tokens present in different fields. If yes how? --0016e64135dc0a5473046af6565c--