jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rokham <somebodyik...@gmail.com>
Subject jackrabbit's xml import overhead
Date Thu, 10 Jul 2008 17:47:40 GMT

Hi all,

I'm trying to decide between the following two options and haven't been able
to get my answer doing some google searches.

I am writing an application which requires importing MANY, LARGE (not
exactly sure how many or how big yet...) xml files into jackrabbit. My
concerns are two fold:

1. I need to use Lucene to index these xml files and I want to be able to
run xpath queries on these data (the hierarchy of my content is important)

2. I want the process of importing (which will happen frequently) to be as
fast as possible.

I am not sure which of my two solutions work?


1. Import all the xml files using jackrabbit's xml import api 
  - This keeps the structure of the xml content but it's presumably slow.
I'm not sure what's the overhead. I wonder if anyone has done any profiling
for jackrabbit 1.4. Are there tweaks that can make this process faster?

2. Import all the xml files' content as plain strings
  - I believe this will prevent lucene/jackrabbit to be aware of the
hierarchy of the data, but I'm NOT sure. Would the imports be faster in this
case? Would they be a lot faster? Would searching the content be as accurate
as the first scenario?

Any help is very much appreciated.

Rokham S.  

View this message in context: http://www.nabble.com/jackrabbit%27s-xml-import-overhead-tp18388305p18388305.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

View raw message