Support 'ben' as a prefix particle for Hebrew/Arabic names

## Background

`ben` (Hebrew/Arabic "son of") functions as a last-name prefix particle in names like "Ahmad ben Husain", exactly like `van` or `von`. It was removed from `PREFIXES` in v0.2.5 because it conflicts with the common English given/middle name "Ben" (short for Benjamin) — e.g. "Alex Ben Johnson" would incorrectly eat "Ben" as a prefix.

## Proposed approach

A case-sensitive heuristic in `is_prefix()`: treat `ben` as a prefix only when it appears **already lowercase** in an otherwise mixed-case name. In "Ahmad ben Husain" the lowercase `ben` is a strong signal it's a particle; in "Alex Ben Johnson" the capitalized `Ben` signals a given name.

This is consistent with the existing precedent in `is_an_initial()`, which uses original casing to distinguish initials from other tokens.

## Risks

The case-sensitive heuristic is a **weak, easily-destroyed signal** and can fail in *both* directions:

- **False positives on lowercased input.** Datasets that arrive all-lowercase (e.g. "alex ben johnson") would have `ben` treated as a particle, eating the middle name. All-lowercase and all-uppercase input are common in real data.
- **False negatives on capitalized particles — including the motivating names.** Title-cased data breaks it: "David Ben Gurion" has a capitalized `Ben` that genuinely *is* the particle, so the heuristic would miss it. Any title-cased or ALL-CAPS dataset destroys the casing signal.
- **Contradicts the library's own stance on casing.** The parser lowercases for matching precisely because input casing is unreliable, and the whole `capitalize()` feature exists to *repair* bad casing. Basing a parse decision on casing reintroduces the assumption the rest of the library rejects.
- **Doesn't resolve the ambiguity, only relocates it.** `ben` (son-of) vs. "Ben" (Benjamin) is genuinely ambiguous; casing is a proxy that works for clean mixed-case input and silently fails otherwise.

Net: a **default-on** `ben` heuristic could be wrong more often than the status quo (where `ben` is just a normal name piece). The opt-in workaround below stays the safe recommendation; any default handling should be weighed against these failure modes, and would most defensibly ship **opt-in** rather than as a global default.

## Why it's non-trivial (implementation)

`is_prefix()` is called from five places in `parser.py`:

- **line 250** — initials computation
- **line 448** — `_split_last()` for `last_base`/`last_prefixes`
- **line 1054** — main prefix-join loop during parsing
- **line 1075** — chained prefix lookahead
- **line 1106** — `cap_word()` during capitalization

Making `is_prefix()` case-sensitive globally would break the capitalization path (line 1106), where a capitalized `Van` in an all-caps input being normalized needs to still be recognized as a prefix and lowercased. A narrower fix — special-casing `ben` only in the parse-flow call sites, not in `cap_word` — would work but requires more surgical changes.

## Workaround

Users with Hebrew/Arabic name datasets can add it themselves:

```python
from nameparser.config import CONSTANTS
CONSTANTS.prefixes.add('ben')
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support 'ben' as a prefix particle for Hebrew/Arabic names #183

Background

Proposed approach

Risks

Why it's non-trivial (implementation)

Workaround

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support 'ben' as a prefix particle for Hebrew/Arabic names #183

Description

Background

Proposed approach

Risks

Why it's non-trivial (implementation)

Workaround

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions