jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Wallace ...@rwmotloc.com>
Subject Re: Loosing documents from DATASTORE.
Date Wed, 24 Jun 2009 17:23:07 GMT
Thank you very much for your response...

I will answer all your questions...  In order below.

First let me say that I just did an Import on a clean db, from an XML 
document, which brought in 170 documents.  No errors anywhere. I did a 
count(*) from DATASTORE and I can see 170 docs there... Then I caused my 
GC job to run and now there are only 5 records in DATASTORE ... So 
evidently it is the GC job doing it.

It is worth saying that I have 2 nodes on each of the clusters I'm testing.

Yesterday I did an import to one of the clusters (150 docs).  To the 
other cluster I manually uploaded 3 documents to my app which were added 
to the DATASTORE as well... I let both clusters run for the night and on 
both cases, this morning, most of the records in DATASTORE are gone...

In both cases there are a few documents (5) that remain there. I don't 
know why.

It would seem that all new documents are deleted by GC, but that's not 
true... I just uploaded one of the ones i did yesterday, and after 
running GC, It remained. However all the ones I imported are gone, 
except for the misterios 5.

Now I will answer your questions below:


Thomas Müller wrote:
> What version of Jackrabbit do you use? 
jackrabbit-api.jar - 1.4.0
jackrabbit-core.jar - 1.4.1
jackrabbit-jcr-commons.jar - 1.4.0
jackrabbit-spi.jar - 1.4.0
jackrabbit-spi-commons.jar - 1.4.0
jackrabbit-text-extractors.jar - 1.4.0

It seems wrong that I have a core 1.4.1 and 1.4.0 for the rest of the 
stuff...  But that's how the framework i have came (liferay 5.1.2)

> How did you find out
> you are missing data (could you post the exception stack trace)? 
No errors... Basically a user reported that some docs he added a day 
before were not found (using our app's UI) and I went straight to the DB 
and noticed that most of the docs were gone... I've researched database 
backups and realize that this has been going on for a while.

> What
> does your repository.xml look like, and did you change it recently?
>   
I will paste the repository.xml file now...  I changed it on May 16th, 
which is when I moved documents from FS to DB, and the problem has been 
there, it looks like, since then... It is a shame that I did not realize 
this before, but that's the way it is... I have a backup of may 17th and 
the backup is missing most of the docs I imported on the 16th... 
Evidently GC job ran before the backup.

<?xml version="1.0"?>
<Repository>

  <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
    <param name="driver" value="com.mysql.jdbc.Driver"/>
    <param name="url" 
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
    <param name="user" value="username" />
    <param name="password" value="userPassword" />
    <param name="schema" value="mysql"/>
    <param name="databaseType" value="mysql"/>
    <param name="minRecordLength" value="1024"/>
    <param name="maxConnections" value="3"/>
    <param name="copyWhenReading" value="true"/>
    <!-- prefix can NOT be used other than to specify a schema when used
         this seems to be due to some inconsistency in jackrabbit when
         creating the table and when reading it. It uses the prefix for 
creation
         but not for using it. So, in MySql we use no prefix
         the talble name is then DATASTORE, so it is reserved by jackrabbit
      -->
    <param name="tablePrefix" value=""/>
  </DataStore>

  <!-- FS Should not be shared accross nodes in the cluster,
       so this should either be local, or prefixed for each node in the 
db -->
  <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
    <param name="driver" value="com.mysql.jdbc.Driver"/>
    <param name="url" 
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
    <param name="user" value="username" />
    <param name="password" value="userPassword" />
    <param name="schema" value="mysql"/>
    <param name="schemaObjectPrefix" value="JCR_NODE1_FS_"/>
  </FileSystem>
  <Security appName="Jackrabbit">
    <AccessManager 
class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
    <LoginModule 
class="org.apache.jackrabbit.core.security.SimpleLoginModule">
      <param name="anonymousId" value="anonymous" />
    </LoginModule>
  </Security>
  <Workspaces rootPath="${rep.home}/workspaces" 
defaultWorkspace="liferay" />
  <Workspace name="${wsp.name}">
    <!-- FS Should not be shared accross nodes in the cluster,
         so this should either be local, or prefixed for each node in 
the db -->
    <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
      <param name="driver" value="com.mysql.jdbc.Driver"/>
      <param name="url" 
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
      <param name="user" value="username" />
      <param name="password" value="userPassword" />
      <param name="schema" value="mysql"/>
      <param name="schemaObjectPrefix" value="JCR_NODE1_${wsp.name}_FS_"/>
    </FileSystem>
    <!-- PM needs to be shared accross the cluster -->
    <PersistenceManager 
class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
      <param name="url" 
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
      <param name="user" value="username" />
      <param name="password" value="userPassword" />
      <param name="schemaObjectPrefix" value="JCR_${wsp.name}_PM_"/>
      <param name="externalBLOBs" value="false"/>
    </PersistenceManager>
  </Workspace>
  <Versioning rootPath="${rep.home}/version">
    <!-- FS Should not be shared accross nodes in the cluster,
         so this should either be local, or prefixed for each node in 
the db -->
    <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
      <param name="driver" value="com.mysql.jdbc.Driver"/>
      <param name="url" 
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
      <param name="user" value="username" />
      <param name="password" value="userPassword" />
      <param name="schema" value="mysql"/>
      <param name="schemaObjectPrefix" value="JCR_NODE1_V_FS_"/>
    </FileSystem>
    <!-- PM needs to be shared accross the cluster -->
    <PersistenceManager 
class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
      <param name="url" 
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
      <param name="user" value="username" />
      <param name="password" value="userPassword" />
      <param name="schemaObjectPrefix" value="JCR_V_PM_"/>
      <param name="externalBLOBs" value="false"/>
    </PersistenceManager>
  </Versioning>
 
  <!-- Each cluster node needs to have a unique node id -->
  <Cluster id="NODE1" syncDelay="5">
    <!-- Journal needs to be shared accross the cluster -->
    <Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
      <param name="revision" value="${rep.home}/revision"/>
      <param name="driver" value="com.mysql.jdbc.Driver"/>
      <param name="url" 
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
      <param name="user" value="username" />
      <param name="password" value="userPassword" />
      <param name="schema" value="mysql"/>
      <param name="schemaObjectPrefix" value="JCR_JOURNAL_"/>
    </Journal>
  </Cluster>
</Repository>


> Did you migrate data recently using XML import? 
Not recently, it was just back then (May 16th).
> How exactly do you run
> the data store garbage collection?
>
>   
I have a quartz job that runs every night... I have changed confign to 
run it every 5 mintues for testing... Here is the code:

               GarbageCollector gc;
                SessionImpl si = (SessionImpl) 
JCRFactoryUtil.createSession();
                gc = si.createDataStoreGarbageCollector();

                // optional (if you want to report progress 
sometime):                                                                               
                                       

                
//gc.setScanEventListener(this);                                                         
                                                                                   


                // scan must be called to find unused 
elements                                                                                 
                                             

                gc.scan();
                gc.stopScan();

                // delete old 
data                                                                                     
                                                                     

                gc.deleteUnused();

It seems that for now I could just disable GC... But i think it would be 
better to fix it, so that I don't end up with a lot of space being taken 
by unused documents...
> Regards,
> Thomas
>
>   
Again, I really appreciate your response and hope to hear your input on 
my answers soon.

Best Regards!
Alex.

> On Tue, Jun 23, 2009 at 9:03 PM, Alexander Wallace<aw@rwmotloc.com> wrote:
>   
>> Hi all.. I've no idea where the problem exists, and I am researching...
>>
>> I am using the db for storage, and using DATASTORE as well...
>>
>> The first time I roled this out I migrated all documents to the db and ended
>> up with 300+ rows in DATASTORE...
>>
>> I'm going through db backups to find out when it first happened, but right
>> now, when i count (*) from DATASTORE, i see only 10 rows... If i go back a
>> few days I see a few more...
>>
>> Any idea of what could be hapening?
>>
>> I know i run GC every night...
>>
>> Any clues?
>>
>> Thanks!
>>
>>     
>
>
>   

Mime
View raw message