mirror of
https://github.com/RSS-Bridge/rss-bridge.git
synced 2025-08-22 10:18:38 +02:00
[GolemBridge] Add multi-page headings
Some checks failed
Tests / phpunit8 (8.1) (push) Has been cancelled
Tests / phpunit8 (8.2) (push) Has been cancelled
Tests / phpunit8 (8.3) (push) Has been cancelled
Tests / phpunit8 (8.4) (push) Has been cancelled
Build Image on Commit and Release / bake (push) Has been cancelled
Lint / phpcs (7.4) (push) Has been cancelled
Lint / phpcompatibility (7.4) (push) Has been cancelled
Lint / executable_php_files_check (push) Has been cancelled
Tests / phpunit8 (7.4) (push) Has been cancelled
Tests / phpunit8 (8.0) (push) Has been cancelled
Some checks failed
Tests / phpunit8 (8.1) (push) Has been cancelled
Tests / phpunit8 (8.2) (push) Has been cancelled
Tests / phpunit8 (8.3) (push) Has been cancelled
Tests / phpunit8 (8.4) (push) Has been cancelled
Build Image on Commit and Release / bake (push) Has been cancelled
Lint / phpcs (7.4) (push) Has been cancelled
Lint / phpcompatibility (7.4) (push) Has been cancelled
Lint / executable_php_files_check (push) Has been cancelled
Tests / phpunit8 (7.4) (push) Has been cancelled
Tests / phpunit8 (8.0) (push) Has been cancelled
On multi-page articles like [1], some paragraph headers were missing
because they are headers of the article pages.
These headers were previously removed in
c5f586497f
for being redundant with the
original header. The article at [1] proves us wrong, but I added a logic
to ignore truly duplicate headers.
[1] https://www.golem.de/news/es-muss-nicht-immer-apple-sein-fuenf-ueberzeugende-airpods-pro-alternativen-im-test-2508-195000.html
This commit is contained in:
parent
876d3c8ae7
commit
e30698f12f
1 changed files with 10 additions and 1 deletions
|
@ -132,13 +132,22 @@ class GolemBridge extends FeedExpander
|
|||
// delete known bad elements
|
||||
foreach (
|
||||
$article->find('div[id*="adtile"], #job-market, #seminars, iframe,
|
||||
.gbox_affiliate, div.toc') as $bad
|
||||
.gbox_affiliate, div.toc') as $bad
|
||||
) {
|
||||
$bad->remove();
|
||||
}
|
||||
// reload html, as remove() is buggy
|
||||
$article = str_get_html($article->outertext);
|
||||
|
||||
// Add multipage headers, but only if they are different to the article header
|
||||
$firstHeader = $page->find('.table-jtoc td', 0);
|
||||
if (isset($firstHeader)) {
|
||||
$firstHeader = html_entity_decode($firstHeader->title);
|
||||
}
|
||||
$multipageHeader = $article->find('header.paged-cluster-header h1', 0);
|
||||
if (isset($multipageHeader) && $multipageHeader->plaintext !== $firstHeader) {
|
||||
$item .= $multipageHeader;
|
||||
}
|
||||
|
||||
$header = $article->find('header', 0);
|
||||
foreach ($header->find('p, figure') as $element) {
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue