Skip to content

feat: add first_name_prefixes for Arabic given-name prefix joining (#150)#186

Merged
derek73 merged 9 commits into
masterfrom
feature/first-name-prefix-join
Jun 30, 2026
Merged

feat: add first_name_prefixes for Arabic given-name prefix joining (#150)#186
derek73 merged 9 commits into
masterfrom
feature/first-name-prefix-join

Conversation

@derek73

@derek73 derek73 commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add first_name_prefixes set to Constants (default: {'abdul', 'abdel', 'abdal', 'abu', 'abou', 'umm'}) so bound Arabic given-name prefixes join forward to the next word, e.g. "abdul salam ahmed salem"first="abdul salam", middle="ahmed", last="salem"
  • Add SetManager.clear() method (enables the documented opt-out pattern and cleans up an existing workaround in customize.rst)
  • Handles both no-comma and lastname-comma parse paths; guard prevents consuming the last name in 2-token names or when a suffix trails

Behavior change

Default-on — any name with ≥3 tokens where the first given-name token is in first_name_prefixes will parse differently. Opt out with CONSTANTS.first_name_prefixes.clear().

Test plan

  • uv run pytest — 952 passed, 22 xfailed, no regressions
  • uv run mypy nameparser/ — clean
  • uv run ruff check nameparser/ tests/ — clean
  • New test file tests/test_first_name_prefixes.py covers: basic join, 3-token no-middle, 2-token guard, suffix-guard (regression), title interaction, abu+al dual-prefix interaction, suffix interaction, non-prefix unchanged, single-token unchanged, prefix alone, lastname-comma path, mid-name prefix as last-name prefix, opt-out via Constants(first_name_prefixes=set())
  • test_clear_removes_all_entries in tests/test_constants.py covers SetManager.clear()

Closes #150

🤖 Generated with Claude Code

@derek73 derek73 added this to the v1.3.0 milestone Jun 30, 2026
@derek73 derek73 self-assigned this Jun 30, 2026
derek73 added 3 commits June 30, 2026 01:10
The reserve_last guard previously counted suffix tokens as potential last-name
candidates, causing "abdul salam jr" to parse as first="abdul salam",
last="jr" instead of first="abdul", last="salam", suffix="jr".

Fix: count only non-suffix pieces from next_i onward; require ≥2 so
the join target and at least one non-suffix last-name piece both exist.
…ons; correct lc() docstrings

- add_with_encoding and clear now only fire _on_change when the set
  actually changes, consistent with remove()'s existing changed guard
- Correct 'no periods' to 'leading/trailing-periods-stripped' in lc(),
  is_prefix, is_first_name_prefix, and add_with_encoding docstrings
@derek73 derek73 merged commit c84b1cd into master Jun 30, 2026
8 checks passed
@derek73 derek73 deleted the feature/first-name-prefix-join branch June 30, 2026 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding ability to use prefixes with first and middle name

1 participant