lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-828) A RequestProcessor to support updates
Date Wed, 29 Oct 2008 07:12:44 GMT

     [ https://issues.apache.org/jira/browse/SOLR-828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Noble Paul updated SOLR-828:
----------------------------

    Description: 
This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted
and we can easily focus on that solution. 


The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before
{{RunUpdateProcessor}}. 

* The {{UpdateProcessor}} must add an update method. 
* the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields
will be appended else old ones are removed and new ones are added
* The schema must have a {{<uniqueKey>}}
* {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

h1.Implementation
{{UpdateableIndexProcessor}} maintains two separate Lucene indexes for doing the backup
 * *temp.backup.index* : This index stores (not indexed) all the fields (except uniquekey
which is stored and indexed) in the document 
 * *backup.index* : This index stores (not indexed) all the fields (except uniquekey which
is stored and indexed) which are not stored in the main index and the fields which are targets
of copyField.
h1.Implementation of various methods

h2.{{processAdd()}}
{{UpdateableIndexProcessor}} writes the document to *temp.backup.index* . Call next {{UpdateProcessor}}

h2.{{processDelete()}}
{{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which
matches the query and delete from *backup.index* . if it is a delete by id delete the document
with that id from *temp.backup.index* . Call next {{UpdateProcessor}}

h2.{{processCommit()}}
Call next {{UpdateProcessor}}

h2.on {{postCommit/postOmptize}}
{{UpdateableIndexProcessor}} commits the *temp.backup.index* . Gets all the documents from
the *temp.backup.index* one by one . If the document is present in the main index it is copied
to *backup.index* , else it is thrown away because a deletebyquery would have deleted it .Finally
it commits the *backup.index*. *temp.backup.index* is destroyed after that. A new *temp.backup.index*
is recreated when new documents are added

h2.{{processUpdate()}}
{{UpdateableIndexProcessor}} commits the *temp.backup.index* . Check the document first in
*temp.backup.index* . If it is present read the document . If it is not present , check in
*backup.index* .If it is present there , get the searcher from the main index and read all
the missing fields from there, and the backup document is prepared

The single valued fields are used from the incoming document (if present) others are fillled
from backup doc . If append=true all the multivalues values from backup document are added
to the incoming document else the values from backup document is not used if they are present
in incoming document also.

h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
This exposes the data present in the backup indexes. The user must be able to get any document
by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4).
This helps the user to query the backup index and construct the new doc if he wishes to do
so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches
the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index*
. If it finds the document(s) it is returned




  was:
This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted
and we can easily focus on that solution. 


The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before
{{RunUpdateProcessor}}. 

* The {{UpdateProcessor}} must add an update method. 
* the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields
will be appended else old ones are removed and new ones are added
* The schema must have a {{<uniquekeyField>}}
* {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

h1.Implementation
{{UpdateableIndexProcessor}} maintains two separate Lucene indexes for doing the backup
 * *temp.backup.index* : This index stores (not indexed) all the fields (except uniquekey
which is stored and indexed) in the document 
 * *backup.index* : This index stores (not indexed) all the fields (except uniquekey which
is stored and indexed) which are not stored in the actual schema and the fields which are
targets of copyField.
h1.Implementation of various methods

h2.{{processAdd()}}
{{UpdateableIndexProcessor}} writes the document to *temp.backup.index* . Call next {{UpdateProcessor}}

h2.{{processDelete()}}
{{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which
matches the query and delete from *backup.index* . if it is a delete by id delete the document
with that id from *temp.backup.index* . Call next {{UpdateProcessor}}

h2.{{processCommit()}}
Call next {{UpdateProcessor}}

h2.on {{postCommit/postOmptize}}
{{UpdateableIndexProcessor}} commits the *temp.backup.index* . Gets all the documents from
the *temp.backup.index* one by one . If the document is present in the main index it is copied
to *backup.index* , else it is thrown away because a deletebyquery would have deleted it .Finally
it commits the *backup.index*. *temp.backup.index* is destroyed after that

h2.{{processUpdate()}}
{{UpdateableIndexProcessor}} commits the *temp.backup.index* . Check the document first in
*temp.backup.index* . If it is present read the document . if it is not present , check in
*backup.index* .If it is present there , get the searcher from the main index and read all
the missing fields from there, and the backup document is prepared

The single valued fields are used from the incoming document (if present) others are fillled
from backup doc . If append=true all the multivalues values from backup document are added
to the incoming document else the values from backup document is not used if they are present
in incoming document also.

h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
This exposes the data present in the backp indexes. The user must be able to get any document
by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4).
This helps the user to query the backup index and construct the new doc if he wishes to do
so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* and searches the
*temp.backup.index* first for the id and if the document is absent then it checks in the *backup.index*
and returns the document.





> A RequestProcessor to support updates
> -------------------------------------
>
>                 Key: SOLR-828
>                 URL: https://issues.apache.org/jira/browse/SOLR-828
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>
> This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach
is highlighted and we can easily focus on that solution. 
> The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before
{{RunUpdateProcessor}}. 
> * The {{UpdateProcessor}} must add an update method. 
> * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued
fields will be appended else old ones are removed and new ones are added
> * The schema must have a {{<uniqueKey>}}
> * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.
> h1.Implementation
> {{UpdateableIndexProcessor}} maintains two separate Lucene indexes for doing the backup
>  * *temp.backup.index* : This index stores (not indexed) all the fields (except uniquekey
which is stored and indexed) in the document 
>  * *backup.index* : This index stores (not indexed) all the fields (except uniquekey
which is stored and indexed) which are not stored in the main index and the fields which are
targets of copyField.
> h1.Implementation of various methods
> h2.{{processAdd()}}
> {{UpdateableIndexProcessor}} writes the document to *temp.backup.index* . Call next {{UpdateProcessor}}
> h2.{{processDelete()}}
> {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents
which matches the query and delete from *backup.index* . if it is a delete by id delete the
document with that id from *temp.backup.index* . Call next {{UpdateProcessor}}
> h2.{{processCommit()}}
> Call next {{UpdateProcessor}}
> h2.on {{postCommit/postOmptize}}
> {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Gets all the documents
from the *temp.backup.index* one by one . If the document is present in the main index it
is copied to *backup.index* , else it is thrown away because a deletebyquery would have deleted
it .Finally it commits the *backup.index*. *temp.backup.index* is destroyed after that. A
new *temp.backup.index* is recreated when new documents are added
> h2.{{processUpdate()}}
> {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Check the document first
in *temp.backup.index* . If it is present read the document . If it is not present , check
in *backup.index* .If it is present there , get the searcher from the main index and read
all the missing fields from there, and the backup document is prepared
> The single valued fields are used from the incoming document (if present) others are
fillled from backup doc . If append=true all the multivalues values from backup document are
added to the incoming document else the values from backup document is not used if they are
present in incoming document also.
> h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
> This exposes the data present in the backup indexes. The user must be able to get any
document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4).
This helps the user to query the backup index and construct the new doc if he wishes to do
so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches
the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index*
. If it finds the document(s) it is returned

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message