feat: add first_name_prefixes for Arabic given-name prefix joining (#150)#186
Merged
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The reserve_last guard previously counted suffix tokens as potential last-name candidates, causing "abdul salam jr" to parse as first="abdul salam", last="jr" instead of first="abdul", last="salam", suffix="jr". Fix: count only non-suffix pieces from next_i onward; require ≥2 so the join target and at least one non-suffix last-name piece both exist.
…ons; correct lc() docstrings - add_with_encoding and clear now only fire _on_change when the set actually changes, consistent with remove()'s existing changed guard - Correct 'no periods' to 'leading/trailing-periods-stripped' in lc(), is_prefix, is_first_name_prefix, and add_with_encoding docstrings
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
first_name_prefixesset toConstants(default:{'abdul', 'abdel', 'abdal', 'abu', 'abou', 'umm'}) so bound Arabic given-name prefixes join forward to the next word, e.g."abdul salam ahmed salem"→first="abdul salam",middle="ahmed",last="salem"SetManager.clear()method (enables the documented opt-out pattern and cleans up an existing workaround incustomize.rst)Behavior change
Default-on — any name with ≥3 tokens where the first given-name token is in
first_name_prefixeswill parse differently. Opt out withCONSTANTS.first_name_prefixes.clear().Test plan
uv run pytest— 952 passed, 22 xfailed, no regressionsuv run mypy nameparser/— cleanuv run ruff check nameparser/ tests/— cleantests/test_first_name_prefixes.pycovers: basic join, 3-token no-middle, 2-token guard, suffix-guard (regression), title interaction,abu+aldual-prefix interaction, suffix interaction, non-prefix unchanged, single-token unchanged, prefix alone, lastname-comma path, mid-name prefix as last-name prefix, opt-out viaConstants(first_name_prefixes=set())test_clear_removes_all_entriesintests/test_constants.pycoversSetManager.clear()Closes #150
🤖 Generated with Claude Code