annotator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <rand...@apache.org>
Subject DOM Iteration (was Re: Just a simple example?)
Date Thu, 11 May 2017 19:34:24 GMT
Great to see you here, Sasha!

On Wed, May 10, 2017 at 5:39 PM Sasha Goodman <email@sashagoodman.com>
wrote:

>
> P.S. This afternoon I streamlined the TextQuoteSelector and
> TextPositionSelector to work (in principle ) consistently with Randall
> Leed's implementation that used NodeIterator and textContents.
>
>
Neat :).

I think my takeaway from the simple example thread, and something of which
many of us were likely already well aware, is that there's a desire for a
good highlighter implementation. A way to highlight text is often the first
example people want to see.

While I hope to see experimentation with implementations that try to limit
the impact on the DOM, I think <mark> or <span> wrapping of text nodes is
still the easiest to understand. In this approach, the actual wrapping is
easy. The difficult part is iteration.

Now, some quick background on node iteration.

I chose to use NodeIterator rather than TreeWalker for my dom-seek library
because it meant that the seek function could be stateless, support seeking
forward and backward, and still be able to return the number of characters
consumed by a seek. The desire to know whether to include the current
node's content in the seek count is fulfilled by NodeIterator's
"pointerBeforeReferenceNode". Essentially, a NodeIterator stores a point
before or after a node, rather than simply a current node.

However, using NodeIterator to traverse a Range is not really great. Since
it has a read only currentNode, the best that can be done is to start with
the commonAncestorContainer of the Range. Range has compareNode,
comparePoint, and isPointInRange. I have no idea how expensive these are.
Iterating all the nodes under the commonAncestorContainer doesn't feel
great to begin with. TreeWalker might be more appropriate since its
currentNode could be set to startContainer directly. TreeWalker also
appears to have consistent platform support.

All of this is complicated by the Range being able to point to offsets
within text nodes. For the purposes of highlighting with wrapper elements
it's necessary to split the boundary nodes. I think there are probably a
number of libraries for this, but I propose we write one under our repo.

We might also find that normalizing the endpoints of a Range in some
fashion is a helpful prerequisite. There is a library I found that does
this, but I found its algorithm terribly confusing. I put time into
rewriting it without dependencies. Despite some initial excitement, the
author never fully vetted and accepted my pull request:
https://github.com/webmodules/range-normalize/pull/2

In conclusion, I think there'd be value in bringing some functional
utilities into Apache Annotator for dealing with iteration, range
splitting, and range normalization, with the goal of providing a very
succinct and simple highlighter that looks like this:

```
for (const node of textNodes(range)) {
  const mark = document.createElement('mark');
  node.replaceWith(mark);
  mark.appendChild(node);
}
```

Some care needs to be taken that whatever iteration we use is not
invalidated by the replacement of the text node with its wrapper.

The fact that a simple example like this is hard to produce is evidence of
the underlying complexity described in the above paragraphs. When I see
people wanting a simple highlighter what I hear is that they actually need
simple abstractions upon which to build a highlighter. The highlighter
itself should be easy. Often, highlighters that projects provide are not
shipped standalone or don't do exactly what the author needs (use spans
instead of marks, add a particular class, coalesce overlapping highlights
or not, etc). There is lots of room to do different things but being able
to simply get the nodes to be highlighted is the prerequisite task that
contains most of the complexity.

That's all (and probably way too much) for now. Finding all the tools for
all these things is a pain enough that I think we should have a
comprehensive set of such utilities in Apache Annotator, even if that risks
looking like a bit of NIH syndrome.

Unless anyone objects, I think I'll aim to ship libraries for these:
- Node iteration (https://github.com/tilgovi/dom-node-iterator)
- Tree walking (might not need a library if support is good)
- Range splitting
- Range normalization (see my pull request reference, above)
- Range iterating
- Text distance (https://github.com/tilgovi/dom-seek)

If anyone wants to start on any of the above, you're welcome to depend on
libraries that are outside Apache Annotator. In the case of libraries that
I've written, there is value to bringing them into Apache Annotator because
they are all written in ES6 but not packaged to be consumed as ES6.
Bringing them inside our repo means better code deduplication by tree
shaking in tools like rollup and webpack. They could be packaged as ES6
where they are, but if I'm going to spend time improving the packaging I
would rather just toss out the packaging and get the benefits of the
monorepo having all that build/test boilerplate done once for all of them.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message