Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 5905 invoked from network); 20 May 2009 13:29:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 May 2009 13:29:50 -0000 Received: (qmail 86246 invoked by uid 500); 20 May 2009 13:30:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 86203 invoked by uid 500); 20 May 2009 13:30:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 86193 invoked by uid 99); 20 May 2009 13:30:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 May 2009 13:30:01 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dioxide.software@gmail.com designates 209.85.198.239 as permitted sender) Received: from [209.85.198.239] (HELO rv-out-0506.google.com) (209.85.198.239) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 May 2009 13:29:47 +0000 Received: by rv-out-0506.google.com with SMTP id l9so150211rvb.5 for ; Wed, 20 May 2009 06:29:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=c+YueFD4kIMFIS7WwxTKcU91BuH70Zotiy4hkKSsx7s=; b=aQAaxLSNBKoM48Xc473wAPQn3HrHvlEULZvLeXebMlnJjSw/jFg/JCwInCUaM97KZI vevu0pw077ZIXV1A7XfHsBSfhvk/6p9Hbn76BIJ2wwcEX9cjH2Ab60ETDHIywlZ2tHk/ WJHVdBc77XSG+12MKCi1xc2ZkdsImorI1Erbk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=L7htPJnapWru8WTPlox+v0GHL1V2k3gtyecYwrw8hm+8yEQ8jf7KImg1NXvNI961gQ 1HIOvh1JmNb8TMyTrckM9agJvrnkmIgz6/1oZLBaWlZH3d3dUh1JEYoZ3stY6Qm6z50M 6P87TbYvsiGQ281rR1eegiI6mfu9YnA7vNiew= MIME-Version: 1.0 Received: by 10.143.34.20 with SMTP id m20mr439810wfj.347.1242826164086; Wed, 20 May 2009 06:29:24 -0700 (PDT) In-Reply-To: <4A14048B.9080802@propylon.com> References: <8db6d74a0905200243j74e94fb1r7f74021381610177@mail.gmail.com> <4A13D316.6000702@propylon.com> <867513fe0905200340v20221510kb4c679642928ad58@mail.gmail.com> <8db6d74a0905200546l443ff5efx2923f062cc4acbdc@mail.gmail.com> <4A13FE8A.7030305@propylon.com> <8db6d74a0905200613h3ae3d894m874d3a9a315fd8c1@mail.gmail.com> <4A14048B.9080802@propylon.com> From: KK Date: Wed, 20 May 2009 18:59:04 +0530 Message-ID: <8db6d74a0905200629g5cc50f93r1e8325cd08b0ec64@mail.gmail.com> Subject: Re: How to create a new index To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001636e1fb0a0728cc046a5804cf X-Virus-Checked: Checked by ClamAV on apache.org --001636e1fb0a0728cc046a5804cf Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Thank you again@John. This is even better. I don't have to bother about the 3rd argument, right? I'll use the same one everytime for both registering a new core as well as adding docs to an existing one. Thanks, KK. On Wed, May 20, 2009 at 6:54 PM, John Byrne wrote: > Hi KK, > > You're welcome! > > BTW, I had a quick look at the Javadoc for IndexWriter and noticed this > constructor: > > public IndexWriter(Directory d, Analyzer a) > "Constructs an IndexWriter for the index in d, first creating it if it does > not already exist." > > I think that might solve your problem and simplify the code a little - I > think you could just use that constructor every time, because it will only > create the index if it does not already exist. > > -John > > > KK wrote: > >> Thanks a lot @John. That solved the problem and the other advice is really >> helpful. I'd have bumped over that otherwise. >> This clarifies my doubt, that everytime I've to create a new index just >> call >> the indexwriter with "true" thereby creating the directory, then start >> adding docs with "false" as the 3rd argument instead of "true", right? >> Lucene is pretty simple and gives you the full control of whatever you are >> doing. I've been trying to automate the creation of new solr cores for >> last >> two days without any luck. Finally today moved to Lucene and it fixed my >> problem very soon. Thank you all and special thanks to Lucene guys. >> >> Thanks, >> KK. >> >> On Wed, May 20, 2009 at 6:28 PM, John Byrne >> wrote: >> >> >> >>> I think the problem is that you are creating an new index every time you >>> add a document: >>> >>> IndexWriter writer = new IndexWriter(trueIndexPath, new >>> StandardAnalyzer(), true); >>> >>> The last argument, the boolean 'true' tells IndexWriter to overwrite any >>> existing index in that directory. If you set that to false, it will not >>> overwrite the previous index, but will add to it. >>> >>> How, then do you create it in the first place? You call the IndexWriter's >>> constructor once with 'true' as the 3rd argumrent, creating the index, >>> then >>> subsequently use 'false'. You could do this in your main method, right >>> after >>> you create an instance of SimpleIndexer, but before you call createIndex. >>> >>> -John >>> >>> >>> >>> KK wrote: >>> >>> >>> >>>> Thank you very much. >>>> I'm using the one mentioned by @Anshum ..but the problem is that after >>>> indexing some no of docs what I see is only the last one indexed which >>>> clearly indicates that the index is getting overwritten. I'm posing my >>>> simple indexer and searcher herewith. Actually I'm trying to crawl web >>>> pages >>>> and add each pages content under a filed called "content" againts a >>>> field >>>> called "id" and for this id I'm using the page URL. These are the codes >>>> >>>> The indexer: >>>> -------------------------------------------- >>>> package solrSearch; >>>> >>>> import org.apache.lucene.analysis.SimpleAnalyzer; >>>> import org.apache.lucene.analysis.standard.StandardAnalyzer; >>>> import org.apache.lucene.document.Document; >>>> import org.apache.lucene.document.Field; >>>> import org.apache.lucene.index.IndexWriter; >>>> >>>> public class SimpleIndexer { >>>> >>>> // Base Path to the index directory >>>> private static final String baseIndexPath = "/opt/lucene/index/"; >>>> >>>> >>>> public void createIndex(String pageContent, String pageId, String >>>> coreId) >>>> throws Exception { >>>> String trueIndexPath = baseIndexPath + coreId ; >>>> String contentField = "content"; >>>> String contentId = "id"; >>>> >>>> // Create a writer >>>> IndexWriter writer = new IndexWriter(trueIndexPath, new >>>> StandardAnalyzer(), true); >>>> >>>> System.out.println("Adding page to lucene " + pageId); >>>> Document doc = new Document(); >>>> doc.add(new Field(contentField, pageContent, Field.Store.YES, >>>> Field.Index.TOKENIZED)); >>>> doc.add(new Field(contentId, pageId, Field.Store.YES, >>>> Field.Index.TOKENIZED)); >>>> >>>> // Add documents to the index >>>> writer.addDocument(doc); >>>> >>>> // Lucene recommends calling optimize upon completion of indexing >>>> writer.optimize(); >>>> >>>> // clean up >>>> writer.close(); >>>> } >>>> >>>> public static void main(String args[]) throws Exception{ >>>> SimpleIndexer empIndex = new SimpleIndexer(); >>>> empIndex.createIndex("this is sample test content", "test0", "core0"); >>>> System.out.println("Data indexed by lucene"); >>>> } >>>> >>>> } >>>> >>>> and the searcher: >>>> --------------------------------------- >>>> package solrSearch; >>>> >>>> import java.io.FileReader; >>>> import java.io.IOException; >>>> import java.io.InputStreamReader; >>>> import java.util.Date; >>>> >>>> import org.apache.lucene.analysis.Analyzer; >>>> import org.apache.lucene.analysis.standard.StandardAnalyzer; >>>> import org.apache.lucene.document.Document; >>>> import org.apache.lucene.index.FilterIndexReader; >>>> import org.apache.lucene.index.IndexReader; >>>> import org.apache.lucene.queryParser.QueryParser; >>>> import org.apache.lucene.search.HitCollector; >>>> import org.apache.lucene.search.Hits; >>>> import org.apache.lucene.search.IndexSearcher; >>>> import org.apache.lucene.search.Query; >>>> import org.apache.lucene.search.ScoreDoc; >>>> import org.apache.lucene.search.Searcher; >>>> import org.apache.lucene.search.TopDocCollector; >>>> >>>> /** Simple command-line based search demo. */ >>>> public class SimpleSearcher { >>>> private static final String baseIndexPath = "/opt/lucene/index/" ; >>>> >>>> private void searchIndex(String queryString, String coreId) throws >>>> Exception{ >>>> String trueIndexPath = baseIndexPath + coreId; >>>> String searchField = "content"; >>>> IndexSearcher searcher = new IndexSearcher(trueIndexPath); >>>> QueryParser queryParser = null; >>>> try { >>>> queryParser = new QueryParser(searchField, new >>>> StandardAnalyzer()); >>>> } catch (Exception ex) { >>>> ex.printStackTrace(); >>>> } >>>> >>>> Query query = queryParser.parse(queryString); >>>> >>>> Hits hits = null; >>>> try { >>>> hits = searcher.search(query); >>>> } catch (Exception ex) { >>>> ex.printStackTrace(); >>>> } >>>> >>>> int hitCount = hits.length(); >>>> System.out.println("Results found :" + hitCount); >>>> >>>> for (int ix=0; (ix>>> Document doc = hits.doc(ix); >>>> System.out.println(doc.get("id")); >>>> System.out.println(doc.get("content")); >>>> } >>>> } >>>> >>>> public static void main(String args[]) throws Exception{ >>>> SimpleSearcher searcher = new SimpleSearcher(); >>>> String queryString = args[0]; >>>> System.out.println("Quering for :" + queryString); >>>> searcher.searchIndex(queryString, "core0"); >>>> } >>>> >>>> } >>>> >>>> --------------- >>>> When I tried intially without having the core0 directory, it >>>> automatically >>>> created that. Its fine, but I'm not able to figure what is the issue, >>>> why >>>> the data is getting overwritten. Some silly mistakes some where. Can >>>> some >>>> one point me that? >>>> And this is the code snip that I'm using to post to lucene index. >>>> >>>> public void postToSolr(String rawText, String pageId) throws Exception{ >>>> // Which solr core are we posting to??? >>>> //String solrCoreId = getCoreId(pageId); >>>> String coreId = "core0"; >>>> SimpleIndexer indexer = new SimpleIndexer(); >>>> indexer.createIndex(rawText, pageId, coreId); >>>> >>>> } >>>> >>>> NB: I din't pay attention to change the names , so you might find the >>>> word >>>> "solr" here and there. I was using that earlier, but bcoz of lack of >>>> facility of creating new separate indexes I moved to lucene today only. >>>> I >>>> guess trying to crete a new index with non-existing directory will >>>> automatically create it, which is what i want. Correct me if i'm wrong. >>>> As >>>> I >>>> mentioned earlier for each domain [say www.bcd.co.uk] I want to have a >>>> separate index and coreId is a map of this URL to a unique number. Do >>>> let >>>> me >>>> know if i'm going wrong anywhere of if you feel it can be done in any >>>> other >>>> better way. >>>> >>>> >>>> Thanks, >>>> KK. >>>> >>>> >>>> On Wed, May 20, 2009 at 4:10 PM, Anshum wrote: >>>> >>>> >>>> >>>> >>>> >>>>> Hi KK, >>>>> >>>>> Easier still, you could just open the indexwriter with the last (3rd) >>>>> arguement as true, this way the indexwriter would create a new index as >>>>> soon >>>>> as you start indexing. Also, if you just leave the indexWriter without >>>>> the >>>>> 3rd arguement, it'd conditionally create a new directory i.e. only if >>>>> the >>>>> index dir doesn't exist at that location would it create a new index >>>>> else >>>>> it >>>>> would append to the already existing index at that location. >>>>> Coming to the 2nd point, if you are talking about the index name, as >>>>> mentioned by John you could simply use the timestamp as the index name. >>>>> >>>>> -- >>>>> Anshum Gupta >>>>> Naukri Labs! >>>>> http://ai-cafe.blogspot.com >>>>> >>>>> The facts expressed here belong to everybody, the opinions to me. The >>>>> distinction is yours to draw............ >>>>> >>>>> >>>>> On Wed, May 20, 2009 at 3:23 PM, John Byrne >>>>> wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> You can do this with pure Java. Create a file object with the path you >>>>>> want, check if it exists, and it not, create it: >>>>>> >>>>>> File newIndexDir = new File("/foo/bar") >>>>>> >>>>>> if(!newFileDir.exists()) { >>>>>> >>>>>> newDirFile.mkdirs(); >>>>>> } >>>>>> >>>>>> The 'mkdirs()' method creates any necessary parent directories. >>>>>> >>>>>> If you want to automate the generation of the path itself, then there >>>>>> are >>>>>> several ways to do it, but the best way really depends on *why* you're >>>>>> generating a new index. For instance, you could just create a >>>>>> timestamped >>>>>> name, but that name might not be very meaningful. >>>>>> >>>>>> Hope that helps! >>>>>> >>>>>> -John >>>>>> >>>>>> KK wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> How to create a new index? everytime I need to do so , I've to create >>>>>>> a >>>>>>> new >>>>>>> directory and put the path to that, right? how to automate the >>>>>>> creation >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> of >>>>>> >>>>>> >>>>> >>>>> >>>>>> new directory? >>>>>> >>>>>> >>>>>>> I'm a new user of lucene. Please help me out. >>>>>>> >>>>>>> Thanks, >>>>>>> KK. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> >>>>> >>>>> >>>>>> No virus found in this incoming message. >>>>>> >>>>>> >>>>>>> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database: >>>>>>> 270.12.35/2123 - Release Date: 05/19/09 17:59:00 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> No virus found in this incoming message. >>>> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database: >>>> 270.12.35/2123 - Release Date: 05/19/09 17:59:00 >>>> >>>> >>>> >>>> >>>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >>> >>> >> >> ------------------------------------------------------------------------ >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database: >> 270.12.35/2123 - Release Date: 05/19/09 17:59:00 >> >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --001636e1fb0a0728cc046a5804cf--