Describe the bug
(drafted by claude, reviewed by me -gpshead)
Four search include templates render the indexed description with |safe:
templates/search/includes/jobs.job.html:6
templates/search/includes/events.event.html:28
templates/search/includes/events.calendar.html:2
templates/search/includes/downloads.release.html:4
<p>{{ result.description|safe }}</p>
The indexed value is produced by striptags(truncatewords_html(..., 50)) in the corresponding prepare_description methods:
apps/jobs/search_indexes.py:104 (obj.description.rendered — user-submitted job markup)
apps/events/search_indexes.py:68 (obj.description.rendered — event descriptions, which can originate from imported external calendar feeds)
apps/events/search_indexes.py:31 (obj.description, calendar model)
apps/pages/search_indexes.py:30 and apps/downloads/search_indexes.py:46 (staff-authored content)
Django's docs warn explicitly: "striptags doesn't provide any guarantee about its output being HTML safe … you should NEVER apply the safe filter to striptags output."
To be fair about actual risk (and thus this being a public issue instead of GHSA), I tested the pinned Django (5.2.11) rather than just citing the docs: the classic nested-tag bypass (<sc<script>ript>) is neutralized by strip_tags' strip-until-stable loop, HTML entities stay encoded, and truncatewords_html runs before striptags (the safe order). No working bypass today. However, literal </> characters that don't parse as tags do pass through (strip_tags("a < b and c > d") returns it unchanged), so the pattern's safety ultimately rests on Python's HTMLParser and every browser tokenizing identically — a differential class with historical precedent. Since the indexed value feeds from user-submitted and externally-imported content, that's a fragile invariant to bet on.
One wrinkle: |safe is currently load-bearing for display — strip_tags leaves entities encoded (& stays &), so simply dropping |safe would double-escape them and render visible & in search results. One possible fix is to make the indexed value genuinely plain text at index time and let autoescaping do its job:
import html
return html.unescape(strip_tags(truncatewords_html(obj.description.rendered, 50)))
in the five prepare_description methods, then drop |safe from the four templates.
Describe the bug
(drafted by claude, reviewed by me -gpshead)
Four search include templates render the indexed description with
|safe:templates/search/includes/jobs.job.html:6templates/search/includes/events.event.html:28templates/search/includes/events.calendar.html:2templates/search/includes/downloads.release.html:4The indexed value is produced by
striptags(truncatewords_html(..., 50))in the correspondingprepare_descriptionmethods:apps/jobs/search_indexes.py:104(obj.description.rendered— user-submitted job markup)apps/events/search_indexes.py:68(obj.description.rendered— event descriptions, which can originate from imported external calendar feeds)apps/events/search_indexes.py:31(obj.description, calendar model)apps/pages/search_indexes.py:30andapps/downloads/search_indexes.py:46(staff-authored content)Django's docs warn explicitly: "striptags doesn't provide any guarantee about its output being HTML safe … you should NEVER apply the safe filter to striptags output."
To be fair about actual risk (and thus this being a public issue instead of GHSA), I tested the pinned Django (5.2.11) rather than just citing the docs: the classic nested-tag bypass (
<sc<script>ript>) is neutralized bystrip_tags' strip-until-stable loop, HTML entities stay encoded, andtruncatewords_htmlruns beforestriptags(the safe order). No working bypass today. However, literal</>characters that don't parse as tags do pass through (strip_tags("a < b and c > d")returns it unchanged), so the pattern's safety ultimately rests on Python's HTMLParser and every browser tokenizing identically — a differential class with historical precedent. Since the indexed value feeds from user-submitted and externally-imported content, that's a fragile invariant to bet on.One wrinkle:
|safeis currently load-bearing for display —strip_tagsleaves entities encoded (&stays&), so simply dropping|safewould double-escape them and render visible&in search results. One possible fix is to make the indexed value genuinely plain text at index time and let autoescaping do its job:in the five
prepare_descriptionmethods, then drop|safefrom the four templates.