lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautovic <emir.arnauto...@sematext.com>
Subject Re: Indexing books, chapters and pages
Date Tue, 01 Mar 2016 16:30:47 GMT
Hi,
 From the top of my head - probably does not solve problem completely, 
but may trigger brainstorming: Index chapters and include page break 
tokens. Use highlighting to return matches and make sure fragment size 
is large enough to get page break token. In such scenario you should use 
slop for phrase searches...

More I write it, less I like it, but will not delete...

Regards,
Emir

On 01.03.2016 12:56, Zaccheo Bagnati wrote:
> Hi all,
> I'm searching for ideas on how to define schema and how to perform queries
> in this use case: we have to index books, each book is split into chapters
> and chapters are split into pages (pages represent original page cutting in
> printed version). We should show the result grouped by books and chapters
> (for the same book) and pages (for the same chapter). As far as I know, we
> have 2 options:
>
> 1. index pages as SOLR documents. In this way we could theoretically
> retrieve chapters (and books?)  using grouping but
>      a. we will miss matches across two contiguous pages (page cutting is
> only due to typographical needs so concepts could be split... as in printed
> books)
>      b. I don't know if it is possible in SOLR to group results on two
> different levels (books and chapters)
>
> 2. index chapters as SOLR documents. In this case we will have the right
> matches but how to obtain the matching pages? (we need pages because the
> client can only display pages)
>
> we have been struggling on this problem for a lot of time and we're  not
> able to find a suitable solution so I'm looking if someone has ideas or has
> already solved a similar issue.
> Thanks
>

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Mime
View raw message