Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6BF56DA92 for ; Sat, 6 Oct 2012 23:29:02 +0000 (UTC) Received: (qmail 93965 invoked by uid 500); 6 Oct 2012 23:29:00 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 93871 invoked by uid 500); 6 Oct 2012 23:28:59 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 93863 invoked by uid 99); 6 Oct 2012 23:28:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Oct 2012 23:28:59 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kulksac@hawk.iit.edu designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Oct 2012 23:28:51 +0000 Received: by mail-ie0-f176.google.com with SMTP id k11so7915973iea.35 for ; Sat, 06 Oct 2012 16:28:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type :x-gm-message-state; bh=P+TZkBH9JfNsEz2CBf4SBW20BOSctyxWThvQmItLVvI=; b=b1cDRF25pGnxwDCi7s8KQ0XStVwAMDsplcsWZUTksYOxduk6nY1KfV3oT8oIIZ34jb /Z2yuDtU0Vhrk6Asez4CQg+omzxp+w3v4FKKB6yt/JmKvzwpSuj7y5pRXuoiPPxSQide CfpsEsUlb7syEfeJfrY4t+22AjprBEs1hZYUsUQ6q8ZrQFwoKmVzA8fsM1UZyznjRIE4 GpnUBR3vhdkYjoD8bDQFswaF5V0FK9urZ4vhAZfp2ACxm0O/P0fIgK/zLiCii6CNsT+k AjIgDfQZlJW29LMpc1BnJaDcj5L0myzNWpIMN/H4eup12EHT57qXRcNIxTYHlrtW9lwH +vNw== MIME-Version: 1.0 Received: by 10.50.156.232 with SMTP id wh8mr4264710igb.56.1349566108690; Sat, 06 Oct 2012 16:28:28 -0700 (PDT) Received: by 10.64.58.197 with HTTP; Sat, 6 Oct 2012 16:28:28 -0700 (PDT) Date: Sat, 6 Oct 2012 18:28:28 -0500 Message-ID: Subject: TREC document Parser questions.. From: Sachin Kulkarni To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=e89a8f3b9b5f82678004cb6c58df X-Gm-Message-State: ALoCoQl7ElxnHAckZJCP+qnpKy5lEA1LPGS7tPyWOeRGLxgKwHBGK9tDYHOyze6IzWIdh3029SOF --e89a8f3b9b5f82678004cb6c58df Content-Type: text/plain; charset=ISO-8859-1 Hi, I am using the TRECParserByPath in lucene to index the TREC disc 4-5 data. This does cover all the filetypes except CR collection IS Lucene using the default Gov2parser to par the CR Collection? IS there a parser that can be use for the CR Collection directly? Thank you. Regards, Sachin --e89a8f3b9b5f82678004cb6c58df--