Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 81DCE116C2 for ; Sun, 14 Sep 2014 20:06:31 +0000 (UTC) Received: (qmail 53469 invoked by uid 500); 14 Sep 2014 20:06:28 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 53408 invoked by uid 500); 14 Sep 2014 20:06:28 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 53393 invoked by uid 99); 14 Sep 2014 20:06:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Sep 2014 20:06:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of kulksac@hawk.iit.edu designates 209.85.220.41 as permitted sender) Received: from [209.85.220.41] (HELO mail-pa0-f41.google.com) (209.85.220.41) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Sep 2014 20:06:22 +0000 Received: by mail-pa0-f41.google.com with SMTP id bj1so4960731pad.28 for ; Sun, 14 Sep 2014 13:06:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=4nRG81JYavsaz7/H5ycbLb3CILjYGjLnvrZcg1JVSrM=; b=W5E2QxiPMktUsMQxPyi1duowWryFvJmRYsQ6artPyX0q3RXpMjZ/ynoDeipW8obsHS lh8zzMAVcWpSuzu5gfPW97wuGKWeBcKo5yafsQ9YeivR6zPN2m3bPurYrlcz/nwfjy5H CxEU69v/I3d32opl2PSSbLJaew3fq3oBgtfSnVaNebPDx6xezY+0ApgHoQN+Tk3yUOYf W6Nk7WFSx6l6pnvtgXqNdQY4fEPLtd5rK+0Q6BO4ohKANqZTkBzWmQ6bQ4Mz3x1XG7aK /QhhjxxWDTeWxA/aBSzjr0Q4luGXzOHP8hqECfieDw3GdYdlaE9jkRU0Dvwq2Ac6OMHW MZwg== X-Gm-Message-State: ALoCoQnbc/uPG/lwozs5/K2gjdSnwkWbhuIsNZ62SjrYlC9mtD9ru53tJB2vSgcytkr/tZGwZ1UE MIME-Version: 1.0 X-Received: by 10.66.231.38 with SMTP id td6mr13371pac.156.1410725161910; Sun, 14 Sep 2014 13:06:01 -0700 (PDT) Received: by 10.66.186.164 with HTTP; Sun, 14 Sep 2014 13:06:01 -0700 (PDT) Date: Sun, 14 Sep 2014 15:06:01 -0500 Message-ID: Subject: Can lucene index tokenized files? From: Sachin Kulkarni To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7b11200d26f78405030c0c6e X-Virus-Checked: Checked by ClamAV on apache.org --047d7b11200d26f78405030c0c6e Content-Type: text/plain; charset=UTF-8 Hi, I have a dataset which has files in the form of tokens where the original data has been tokenized, stemmed, stopworded. Is it possible to skip the lucene analyzers and index this dataset in Lucene? So far the dataset I have dealt with was raw and used Lucene's tokenization and stemming schemes. Thank you. Regards, Sachin --047d7b11200d26f78405030c0c6e--