Today, I made rectangles, tomorrow, I turn those rectangles into magical digital clipped newsprint.
Just as if you could only use scissors in such a way as to make perfect rectangles, you can click and drag to define the bounds of an article. This will then be clipped at render time.
Each rectangle is then associated with a certain page of the newsprint as a whole, and so one can thereby walk up the chain, and thus, textual extraction becomes simpler for machines to consider, since we draw their attention to only the parts that matter in context.
When viewing the results from a search, we could reconstruct the rectangles in miniature, but to scale and in place, and highlight them in a bright color, thus, to draw the eye to the area of the page one might find what they are looking for. The other irrelevant rectangles could be shown in a dim lattice behind our focused rectangle, thus indicating overall page structure as well.
Such are my thoughts, a little after midnight on July 17th
Comments are coming soon...