lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Bouillon <nico2000...@yahoo.com.INVALID>
Subject SOLR Atomic update of custom stored metadata clears full-text index! How to add metadata without losing full-text search
Date Wed, 08 Mar 2017 15:46:49 GMT
Dear SOLR friends,

I developed a small ERP. I produce PDF documents linked to objects in my ERP: invoices, timesheets,
contracts, etc...
I have also the possibility to attach documents to a particular object and when I view an
invoice for instance, I can see the attached documents.

Until now, I was adding reference to these documents in my DB and store docs on the server.

Still, I found it cumbersome and not flexible enough, so I removed the table documents from
my DB and decided to use SOLR to add metadata to the documents in the index.

Currently, I have the following custom fields: 
- ktype (string): invoice, contract, etc… 
- kattachment (int): 0 or 1 
- kref (int): reference in DB of linked object, ex: 10 (for contract 10 in DB) 
- ktags (strings, mutifield): free tags, ex: customerX, consulting, development

Each time I upload a document, I store in on server and then add it to SOLR using "extract"
adding the metadata at the same time. It works fine.

I would like now 3 things:

- For existing documents that have not been extracted with metadata altogether at upload (documents
uploaded before I developed the functionality), I'd like to update them with the proper metadata
without losing the full-text search
- Be able to add anytime tags to the ktags field after upload whilst keeping full-text search
- In case I have to re-index, I want to be sure I don't have to restart everything from scratch.

	In a few months, I expect to have thousands of docs in my system....and then I'll add emails

I have very little experience in SOLR. I know I can re-perform an extract instead of an update
when I modify a field but I'm pretty sure it's not the right thing to do + performance problems
can arise.

What do you suggest me to do?

I thought about storing the metadata linked to each document separately (in DB or separate
XML file individually or one XML for all) but I'm pretty sure it will be very slow after a
while.

Thx a lot in advance fro your precious help.
This is my first message to the user list, please excuse anything I may have done wrong…I
learn fast, don’t worry..

Regards

Nico

My configuration:

Synology 1511 running DSM 6.1
Docker container for SOLR using latest stable version
1 core called “katalyst” containing index of all documents

ERP is written in PHP/Mysql for backend and Jquery/Bootstrap for front-end

I have a test env on OSX Sierra running docker, a prod environment on Synology



Mime
View raw message