lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Will Allen" <wal...@Cyveillance.com>
Subject RE: modifying existing index
Date Tue, 23 Nov 2004 19:56:52 GMT
To update a document you need to insert the modified document, then delete the old one.

Here is some code that I use to get you going in the right direction (it wont compile, but
if you follow it closely you will see how I take an array of lucene documents with new properties
and add them, then delete the old ones.):


	public void  updateDocuments( Document[] documentsToUpdate )
	{
		if ( documentsToUpdate.length > 0 )
		{
			String updateDate = Dates.formatDate( new Date(), "yyyyMMddHHmm" );
			//  wait on some other modification to finish
			HashSet failedToAdd = new HashSet();
			waitToModify();
			synchronized(directory)
			{
				IndexWriter indexWriter = null;
				try
				{
					indexWriter = getWriter();
					indexWriter.mergeFactor = 2; //this seems to be needed to accomodate a lucene (ver 1.4.2)
bug
					//otherwise the index does not accurately reflect the change
					//load data from new document into old document
					for ( int i = 0; i < documentsToUpdate.length; i++ )
					{
						try
						{
							Document newDoc = modifyDocument( documentsToUpdate[i], updateDate );
							if ( newDoc != null )
							{
								documentsToUpdate[i] = newDoc;
								indexWriter.addDocument( newDoc );
							}
							else
							{
								failedToAdd.add( documentsToUpdate[i].get( "messageid" ) );
							}
						}
						catch ( IOException addDocException )
						{
							//if we fail to add, make a note and dont delete it
							logger.error( " ["+getContext().getID()+"] error updating message:" + documentsToUpdate[i].get("messageid")
,addDocException );
							failedToAdd.add( documentsToUpdate[i].get( "messageid" ) );
						}
						catch ( java.lang.IllegalStateException ise )
						{
							//if we fail to add, make a note and dont delete it
							logger.error( " ["+getContext().getID()+"] error updating message:" + documentsToUpdate[i].get("messageid")
,ise );
							failedToAdd.add( documentsToUpdate[i].get( "messageid" ) );
						}
					}
					//if we fail to close the writer, we dont want to continue
					closeWriter();
					searcherVersion = -1; //establish that the searcher needs to update
					IndexReader reader = IndexReader.open( indexPath );
					int testid = -1;
					for ( int i = 0; i < documentsToUpdate.length; i++ )
					{
						Document newDoc = documentsToUpdate[i];
						try
						{
							logger.debug( "delete id:" + newDoc.get( "deleteid" ) + " messageid: "
								+ newDoc.get( "messageid" ) );
							reader.delete( Integer.parseInt( newDoc.get( "deleteid" ) ) );
							testid = Integer.parseInt( newDoc.get( "deleteid" ) );
						}
						catch ( NumberFormatException nfe )
						{
							logger.warn( "unable to parse the deleteid:" + newDoc.get( "deleteid" ) );
						}
					}
					reader.close();
				}
				catch ( IOException ioe )
				{
					logger.error( "Unable to update messages  ["+getContext().getID()+"]",  ioe );
				}
				finally
				{
					searcherVersion = -15;
					stateMask -= stateMask & STATE_MODIFYING;
					logUpdate( documentsToUpdate, failedToAdd );
				}				
			}
		}
		//optimizeIndex();
	}

-----Original Message-----
From: Santosh [mailto:santosh.s@softprosys.com]
Sent: Tuesday, November 23, 2004 2:59 PM
To: Lucene Users List
Subject: modifying existing index


I am using lucene for indexing, when I am creating Index the docuemnts are added. but when
I want to modify the single existing document and reIndex again, it is taking as new document
and adding one more time, so that I am getting same document twice in the results.
To overcome this I am deleting existing Index and again recreating whole Index. but is it
possibe to index  the modified document again and overwrite existing document without deleting
and recreation. can I do this? If so how? 

and one more question.
can lucene will be able to do stemming?
If I am searching for "roam" then I know that it can give result for "foam" using fuzzy query.
But my requirement is if I search for "roam" can I get the similar worlist as output. so that
I can show the end user in the column  ---------------   do you mean "foam"?
How can I get similar word list in the given content?  




-----------------------SOFTPRO DISCLAIMER------------------------------

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.
------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message