jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kisu San <Kishore....@gmail.com>
Subject Re: Problem with NodeIterator
Date Wed, 14 Nov 2007 14:25:21 GMT

Dear All,

I have a big question, rather I should have asked you this question in first
place.

I am trying to see the suitability of Jackrabbit for a very large data
models. For an automotive company.

I have lot of entities like Models, Variants, Countries of sale, Fault
Codes, Manuals, Parts, Dealers, 
and so on. Whole purpose of this new implementation is about storing large
documents in different languages. Search will be performed on complex
relations (like in rdbms, several joins) to retrieve the relevant document.

I was trying to implement all of these entities as nodes (including
Reference or Standard data) and define relation between these nodes. Which
could be one to many, many to many or one to one.

To this kind of implementation, is Jackrabbit suitable. Particularly what I
am finding difficult is resolving the references or relation between nodes.
I will end up writing code to resolve these references and going through lot
of iterations.

Can anyone advise me, whether it is good idea to implement this model
entirely in Jackrabbit. Or would it be better to use an RDBMS for data store
and Jackrabbit for document store.

Thanks in Advance
Kishore





Kisu San wrote:
> 
> Hi Marcel,
> 
>>>please note that your query will not work as expected because it is
invalid and 
>>>jackrabbit should actually reject it. see: 
>>>https://issues.apache.org/jira/browse/JCR-1211
> 
> Why can't we use deref as predicate? What is the other alternative to
> resolve the references in XPath.
> I looked at your issue, JCR-1211, but could not find details. 
> 
> 
> 
> 
> 
> 
> Marcel Reutegger wrote:
>> 
>> Kisu San wrote:
>>> Below is my method, I am using DerbyPersistenceManager. 
>>> 
>>> Is there anyway, we can by pass using persistence manager? 
>>> What are the trade offs of not using persistence manager?
>> 
>> the persistence manager is an integral part of jackrabbit, you cannot
>> just 
>> bypass it.
>> 
>>> public ArrayList getBulletinList(String bulletinType, String model)
>>> throws
>>> RepositoryException {
>>> 		
>>> 		log.info("Entering .....");
>>> 		long startTime = System.currentTimeMillis();
>>> 		//String query = "//" + PRIMARY_NODE + "/child::*";
>>> 		//String query = "//element(*, bulletin)";
>>> 		//String query = "//element(*, " + NODE_TYPE + " )";
>>> 		String query ="/jcr:root/BULLETIN//element(*, " + NODE_TYPE
>>> +")[jcr:deref(@btnmodel='" + model +"') and" +
>>> 													" jcr:deref(@btnbulletin_type='"+ bulletinType +"')] order
>>> by
>>> jcr:score() descending";
>> 
>> please note that your query will not work as expected because it is
>> invalid and 
>> jackrabbit should actually reject it. see: 
>> https://issues.apache.org/jira/browse/JCR-1211
>> 
>>> 		ArrayList list = new ArrayList();
>>> 		
>>> 		long getResultsStartTime = System.currentTimeMillis();
>>> 		
>>> 		QueryResult results = getQueryResults(query);
>>> 		
>>> 		long getResultsEndTime = System.currentTimeMillis();
>>> 		log.info("getting nodes from Resultset took " + (getResultsEndTime -
>>> getResultsStartTime) + " ms");
>>> 		
>>> 		NodeIterator it =  results.getNodes();
>>> 		
>>> 
>>> 		log.debug("Size is  " + it.getSize());
>>> 		BulletinDTO dto = null;
>>> 		//while (it.hasNext()) {
>>> 		long loopStartTime = System.currentTimeMillis();
>>> 		for (int i= 0; i < it.getSize(); i++){
>>> 			
>>> 			Node n = (Node) it.next();
>>> 			
>>> 			//log.debug("Node name is  " + n.getName());
>>> 			
>>> 			dto = new BulletinDTO();
>>> 			dto.setName(n.getName());
>>> 			dto.setUuid(n.getUUID());
>>> 			dto.setBulletinId(n.getProperty(ID).getLong());
>>> 		
>>> dto.setBulletinTypeRef(n.getProperty(REF_BULLETIN_TYPE).getNode().getUUID());
>>> 			dto.setModelRef(n.getProperty(REF_MODEL).getNode().getUUID());
>>> 			dto.setContent(n.getProperty(REF_TOPIC).getStream());
>>> 			list.add(dto);
>>> 			
>>> 		}
>>> 		long loopEndTime = System.currentTimeMillis();
>>> 		log.info("To loop through and populate 1000 dtos it took" +
>>> (loopEndTime -
>>> loopStartTime) + " ms");
>>> 		
>>> 		long finishTime = System.currentTimeMillis();
>>> 		log.debug("Finished in " + (finishTime - startTime) + "ms");
>>> 		log.info("Finished");
>>> 		return list;
>>> 
>>> 		}
>>> 
>>> metrics from log are 
>>> 13-43-2007 11:43:16:726 - INFO -
>>> com.entity.data.daoimpl.GenericDAOImpl.getQueryResults() :Finished in
>>> 500 ms
>>> 13-43-2007 11:43:16:726 - INFO -
>>> com.entity.data.daoimpl.BulletinDAOImpl.getBulletinList() :getting nodes
>>> from Resultset took 500 ms
>>> 13-43-2007 11:43:16:742 - DEBUG -
>>> com.entity.data.daoimpl.BulletinDAOImpl.getBulletinList() :Size is 
>>> 10000
>>> 13-43-2007 11:43:32:226 - INFO -
>>> com.entity.data.daoimpl.BulletinDAOImpl.getBulletinList() :To loop
>>> through
>>> and populate 1000 dtos it took15484 ms
>>> 13-43-2007 11:43:32:226 - DEBUG -
>>> com.entity.data.daoimpl.BulletinDAOImpl.getBulletinList() :Finished in
>>> 16000ms
>> 
>> please note that the size of the query result is 10'000 (!) and not one 
>> thousand. the timing quite clearly shows that the majority of the time is
>> spend 
>> iterating over the result and retrieving the nodes.
>> 
>>> I made another call with in same session, their logs are
>>> 
>>> 13-43-2007 11:43:32:414 - INFO -
>>> com.entity.data.daoimpl.GenericDAOImpl.getQueryResults() :Finished in
>>> 188 ms
>>> 13-43-2007 11:43:32:414 - INFO -
>>> com.entity.data.daoimpl.BulletinDAOImpl.getBulletinList() :getting nodes
>>> from Resultset took 188 ms
>>> 13-43-2007 11:43:32:430 - DEBUG -
>>> com.entity.data.daoimpl.BulletinDAOImpl.getBulletinList() :Size is 
>>> 10000
>>> 13-43-2007 11:43:33:305 - INFO -
>>> com.entity.data.daoimpl.BulletinDAOImpl.getBulletinList() :To loop
>>> through
>>> and populate 1000 dtos it took875 ms
>>> 13-43-2007 11:43:33:305 - DEBUG -
>>> com.entity.data.daoimpl.BulletinDAOImpl.getBulletinList() :Finished in
>>> 1079ms
>> 
>> note that the over all time to execute your code dropped significantly
>> because 
>> of the caches in jackrabbit. most nodes are now already cached and the
>> time to 
>> loop over the nodes dropped to 875 ms.
>> 
>> regards
>>   marcel
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem-with-NodeIterator-tf4791277.html#a13747262
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Mime
View raw message