Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 64899 invoked from network); 3 Apr 2009 09:57:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Apr 2009 09:57:03 -0000 Received: (qmail 41615 invoked by uid 500); 3 Apr 2009 09:57:02 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 41595 invoked by uid 500); 3 Apr 2009 09:57:01 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 41584 invoked by uid 99); 3 Apr 2009 09:57:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Apr 2009 09:57:01 +0000 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=DNS_FROM_OPENWHOIS,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Apr 2009 09:56:52 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1Lpg8V-0000qY-0A for users@jackrabbit.apache.org; Fri, 03 Apr 2009 02:56:31 -0700 Message-ID: <22865321.post@talk.nabble.com> Date: Fri, 3 Apr 2009 02:56:30 -0700 (PDT) From: daveg0 To: users@jackrabbit.apache.org Subject: Help with performace again MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: bagel10002000@googlemail.com X-Virus-Checked: Checked by ClamAV on apache.org Hi, I am getting some really poor search performance out of our repository and need some help to try understand what I am doing wrong. I wil try and give as much detail as possible: 1) We are trying to implement an atom repository (see cnd file) http://www.nabble.com/file/p22865321/atom_node_types.cnd atom_node_types.cnd 2) Search performance is VERY slow using this test below: It appears that the problem is not the searches themselves, but the loading of the nods in the returned NodeIterator, this is extemely slow. Does this not lazilly load? @Test public final void testSearchPerformance() throws Exception { // Test jcr:like SpringJCRNodeDAO nodeQuery = (SpringJCRNodeDAO) daoFactory.getDAO(Node.class); StopWatch watch = new StopWatch(); String pathPrefix = "//portal/portal/test-collection//element(*,atom:Entry)"; String searchPrefix = "//element(*,atom:Service)[@atom:title='portal']/element(*,atom:Workspace)[@atom:title='portal']/element(*,atom:Collection)[atom:titletext='test-collection']//element(*,atom:Entry)"; String rootPrefix = "//element(*,atom:Entry)"; Limits limits = new Limits(); limits.setMaxRows(10000); String query = null; QueryResultWrapper> results = null; query = pathPrefix + "[jcr:like(fn:lower-case(@atom:titletext),'title_1%')]"; watch.start(); results = nodeQuery.get(query, limits.getMaxRows(), limits.getOffset()); displayTime(query, watch); displayResults(query, results); } results: 09:43:06,257 [main] INFO SearchPerformanceTest : Query: //portal/portal/test-collection//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')] time: 4598 09:43:06,257 [main] INFO SearchPerformanceTest : Results for: //portal/portal/test-collection//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')] size: 1110 09:43:07,972 [main] INFO SearchPerformanceTest : Query: //portal/portal/test-collection//element(*,atom:Entry)[jcr:contains(@atom:titletext,'title_1*')] time: 1715 09:43:07,972 [main] INFO SearchPerformanceTest : Results for: //portal/portal/test-collection//element(*,atom:Entry)[jcr:contains(@atom:titletext,'title_1*')] size: 1110 09:43:09,639 [main] INFO SearchPerformanceTest : Query: //portal/portal/test-collection//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')] time: 1667 09:43:09,639 [main] INFO SearchPerformanceTest : Results for: //portal/portal/test-collection//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')] size: 1110 09:43:11,260 [main] INFO SearchPerformanceTest : Query: //element(*,atom:Entry)[jcr:contains(@atom:titletext,'title_1*')] time: 1605 09:43:11,260 [main] INFO SearchPerformanceTest : Results for: //element(*,atom:Entry)[jcr:contains(@atom:titletext,'title_1*')] size: 1110 09:43:12,881 [main] INFO SearchPerformanceTest : Query: //element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')] time: 1621 09:43:12,881 [main] INFO SearchPerformanceTest : Results for: //element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')] size: 1110 09:43:14,518 [main] INFO SearchPerformanceTest : Query: //element(*,atom:Service)[@atom:title='portal']/element(*,atom:Workspace)[@atom:title='portal']/element(*,atom:Collection)[atom:titletext='test-collection']//element(*,atom:Entry)[jcr:contains(@atom:titletext,'title_1*')] time: 1637 09:43:14,518 [main] INFO SearchPerformanceTest : Results for: //element(*,atom:Service)[@atom:title='portal']/element(*,atom:Workspace)[@atom:title='portal']/element(*,atom:Collection)[atom:titletext='test-collection']//element(*,atom:Entry)[jcr:contains(@atom:titletext,'title_1*')] size: 1110 09:43:16,186 [main] INFO SearchPerformanceTest : Query: //element(*,atom:Service)[@atom:title='portal']/element(*,atom:Workspace)[@atom:title='portal']/element(*,atom:Collection)[atom:titletext='test-collection']//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')] time: 1668 09:43:16,186 [main] INFO SearchPerformanceTest : Results for: //element(*,atom:Service)[@atom:title='portal']/element(*,atom:Workspace)[@atom:title='portal']/element(*,atom:Collection)[atom:titletext='test-collection']//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')] size: 1110 The node query does something like: public QueryResultWrapper> get(String queryString, long limit, long offset, String userId) throws DAOException { // check user privs code etc removed try { QueryManager queryManager = session.getWorkspace().getQueryManager(); // This code is tied to JackRabbbit as it allows limits // and offsets on the queries. The uncommented line is JCR // implementation agnostic Query query = queryManager.createQuery(queryString, Query.XPATH); // QueryResult queryresult = query.execute(); // ========= JackRabbit-specific code QueryImpl jackRabbitQuery = (QueryImpl) query; jackRabbitQuery.setLimit(limit); jackRabbitQuery.setOffset(offset); jackrabbitQueryResult = (QueryResultImpl) jackRabbitQuery.execute(); // QueryResult queryresult = jackRabbitQuery.execute(); // ===== End of JakcRabbit-specific code // NodeIterator nodes = queryresult.getNodes(); NodeIterator nodes = jackrabbitQueryResult.getNodes(); while (nodes.hasNext()) { returnList.add(nodes.nextNode()); } } catch (Exception e) { LOG.error(e.getMessage(), e); throw new DAOException(e.getMessage()); } } The equivalent Lucene searches through all of the index subdirectories in workspaces/default/index say is: @Test public void testTitle() throws Exception { File[] indexes = new File(indexDir).listFiles(new FileFilter() { @Override public boolean accept(File f) { return f.isDirectory(); } }); StopWatch watch = new StopWatch(); for (File file : indexes) { Directory directory = FSDirectory.getDirectory(file); IndexSearcher searcher = new IndexSearcher(directory); try { System.out.println("Directory: " + directory); Term t = new Term("12:FULL:titletext", "*"); Query query = new WildcardQuery(t); watch.start(); Hits hits = searcher.search(query); //showDocs(hits); watch.stop(); System.out.println("Hits: " + hits.length() + " query: " + query + " time: " + watch.getTime()); watch.reset(); } finally { searcher.close(); directory.close(); } } } private void showDocs(Hits hits) throws CorruptIndexException, IOException { Document doc; for (int i = 0; i < hits.length(); i++) { doc = hits.doc(i); System.out.println("doc: " + doc.getField("_:UUID").stringValue()); } } This returns very quickly: Directory: org.apache.lucene.store.FSDirectory@C:\repository\workspaces\default\index\_5 Hits: 601 query: 12:FULL:titletext:* time: 47 Directory: org.apache.lucene.store.FSDirectory@C:\repository\workspaces\default\index\_b Hits: 699 query: 12:FULL:titletext:* time: 16 Directory: org.apache.lucene.store.FSDirectory@C:\repository\workspaces\default\index\_h Hits: 811 query: 12:FULL:titletext:* time: 47 Directory: org.apache.lucene.store.FSDirectory@C:\repository\workspaces\default\index\_y Hits: 199 query: 12:FULL:titletext:* time: 15 Directory: org.apache.lucene.store.FSDirectory@C:\repository\workspaces\default\index\_z Hits: 1000 query: 12:FULL:titletext:* time: 31 I have inserted some debug statements into the code and it appears that the NodeIterator is the problem, the first call to NodeIterator.hasNext() seems to take seconds with only 1000 nodes, is there something that can be done about this as the searches are quick but the loading of the nodes is VERY slow, regards, Dave -- View this message in context: http://www.nabble.com/Help-with-performace-again-tp22865321p22865321.html Sent from the Jackrabbit - Users mailing list archive at Nabble.com.