lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zaccheo Bagnati <zacch...@gmail.com>
Subject Re: Indexing books, chapters and pages
Date Wed, 02 Mar 2016 08:20:57 GMT
Thanks Emir,
a similar solution has already come in my mind too: searching on chapters,
highlighting the result and retrieve matching pages parsing the highlighted
result... surely not a very efficient approach but could work...
however I think I'll try different approaches before this

Il giorno mar 1 mar 2016 alle ore 17:30 Emir Arnautovic <
emir.arnautovic@sematext.com> ha scritto:

> Hi,
>  From the top of my head - probably does not solve problem completely,
> but may trigger brainstorming: Index chapters and include page break
> tokens. Use highlighting to return matches and make sure fragment size
> is large enough to get page break token. In such scenario you should use
> slop for phrase searches...
>
> More I write it, less I like it, but will not delete...
>
> Regards,
> Emir
>
> On 01.03.2016 12:56, Zaccheo Bagnati wrote:
> > Hi all,
> > I'm searching for ideas on how to define schema and how to perform
> queries
> > in this use case: we have to index books, each book is split into
> chapters
> > and chapters are split into pages (pages represent original page cutting
> in
> > printed version). We should show the result grouped by books and chapters
> > (for the same book) and pages (for the same chapter). As far as I know,
> we
> > have 2 options:
> >
> > 1. index pages as SOLR documents. In this way we could theoretically
> > retrieve chapters (and books?)  using grouping but
> >      a. we will miss matches across two contiguous pages (page cutting is
> > only due to typographical needs so concepts could be split... as in
> printed
> > books)
> >      b. I don't know if it is possible in SOLR to group results on two
> > different levels (books and chapters)
> >
> > 2. index chapters as SOLR documents. In this case we will have the right
> > matches but how to obtain the matching pages? (we need pages because the
> > client can only display pages)
> >
> > we have been struggling on this problem for a lot of time and we're  not
> > able to find a suitable solution so I'm looking if someone has ideas or
> has
> > already solved a similar issue.
> > Thanks
> >
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message