lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fatima Issawi <issa...@qu.edu.qa>
Subject RE: How to use Solr in my project
Date Mon, 30 Dec 2013 08:58:40 GMT
I think we may have up to 100,000 books, but I don't think the site will have a lot of traffic.

Thank you for your help. I think it is a little more clear and will try to implement it now.

> -----Original Message-----
> From: Gora Mohanty [mailto:gora@mimirtech.com]
> Sent: Monday, December 30, 2013 11:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: How to use Solr in my project
> 
> On 30 December 2013 11:27, Fatima Issawi <issawif@qu.edu.qa> wrote:
> > Hi again,
> >
> > We have another program that will be extracting the text, and it will be
> extracting the top right and bottom left corners of the words. You are right, I
> do expect to have a lot of data.
> >
> > When would solr start experiencing issues in performance? Is it better to:
> >
> > INDEX:
> > - document metadata
> > - words
> >
> > STORE:
> > - document metadata
> > - words
> > - coordinates
> >
> > in Solr rather than in the database? How would I set up the schema in order
> to store the coordinates?
> 
> You do not mention the number of documents, but for a few tens of
> thousands of documents, your problem should be tractable in Solr. Not sure
> what document metadata you have, and if you need to search through it, but
> what I would do is index the words, and store the coordinates in Solr, the
> assumption being that words are searched but not retrieved from Solr, while
> coordinates are retrieved but never searched.
> 
> Off the top of my head, each record can be:
> <doc1> <pg1> <word1> <coord_x1> <coord_y1> <coord_x2>
<coord_y2>
> <doc1> <pg1> <word2> ....
> ...
> <doc1> <pg2> ...
> ...
> <doc2> ...
> 
> * <doc_id> and <pg_id> from Solr search results let you retrieve the image
>   from the filesystem
> * The coordinates allow post-processing to highlight the word in the image
> 
> As always, set up a prototype system with a subset of the records in order to
> measure performance.
> 
> > If storing the coordinates in solr is not recommended, what would be the
> best process to get the coordinates after indexing the words and metadata?
> Do I search in solr and then use the documentID to then search the database
> for the words and coordinates?
> 
> You could do that, but Solr by itself should be fine.
> 
> Regards,
> Gora

Mime
View raw message