jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Guggisberg <stefan.guggisb...@gmail.com>
Subject Re: Using Jackrabbit/JCR as IDE workspace data backend
Date Mon, 26 Sep 2011 12:45:52 GMT
hi marcel,

On Sun, Sep 25, 2011 at 3:40 PM, Marcel Bruch <marcel.bruch@gmail.com> wrote:
> Hi,
>
> I'm looking for some advice whether Jackrabbit might be a good choice for my problem.
Any comments on this are greatly appreciated.
>
>
> = Short description of the challenge =
>
> We've built a Eclipse based tool that analyzes java source files and stores its analysis
results in additional files. The workspace  potentially has hundreds of projects and each
project may have up to a few thousands of files. Say, there will be 200 projects and 1000
java source files per project in a single workspace. Then, there will be 200*1000 = 200.000
files.
>
> On a full workspace build, all these 200k files have to be compiled (by the IDE) and
analyzed (by our tool) at once and the analysis results have to be dumped to disk rather fast.
> But the most common use case is that a single file is changed several times per minute
and thus gets frequently analyzed.
>
> At the moment, the analysis results are dumped on disk as plain json files; one json
file for each java class. Each json file is around 5 to 100kb in size; some files grow up
to several megabytes (<10mb), these files have a few hundred JSON complex nodes (which
might perfectly map to nodes in JCR).
>
> = Question =
>
> We would like to change the simple file system approach by a more sophisticated approach
and I wonder whether Jackrabbit may be a suitable backend for this use case. Since we map
all our data to JSON already, it looks like Jackrabbit/JCR is a perfect fit for this but I
can't say for sure.
>
> What's your suggestion? Is Jackrabbit capable to quickly load and store json-like data
- even if 200k files (nodes + their sub-nodes) have to be updated very in very short time?

absolutely. if the data is reasonably structured/organized jackrabbit
should be a perfect fit.
i suggest to leverage the java package space hierarchy for organizing the data
(i.e. org.apache.jackrabbit.core.TransientRepository ->
/org/apache/jackrabbit/core/TransientRepository).
for further data modeling recommondations see [0].

cheers
stefan

[0] http://wiki.apache.org/jackrabbit/DavidsModel

>
>
> Thanks for your suggestions. I've you need more details on what operations are performed
or how data looks like, I would be glad to take your questions.
>
> Marcel
>
> --
> Eclipse Code Recommenders:
>  w www.eclipse.org/recommenders
>  tw www.twitter.com/marcelbruch
>  g+ www.gplus.to/marcelbruch
>
>

Mime
View raw message