Skip to content

kde/line_density NaN inputs: backends disagree and the auto extent collapses to zeros #3628

Description

@brendancol

Describe the bug

kde() and line_density() have no policy for non-finite input coordinates or weights, and the four backends disagree about what happens. One NaN row in a point table (a common artifact of empty geometries or bad CSV parses) produces three different outcomes depending on backend and call style:

  1. Auto-extent path (no template): a single NaN coordinate makes x_arr.min()/max() NaN, so x_range/y_range are NaN, the coordinate vectors are NaN, and the output is a silent all-zero grid. Same for line_density endpoints. All backends.
  2. Template path, numpy / dask+numpy / dask+cupy: the NaN point is silently dropped (the pixel-range math produces an empty loop range, and the dask tile filter's NaN comparisons are False). Reasonable outcome, but undocumented and accidental.
  3. Template path, eager cupy with the gaussian kernel: the GPU kernel has no cutoff, so exp(-0.5 * nan) reaches every pixel and the whole grid comes back NaN. Compact kernels drop the point instead (nan <= 1.0 is False).

A NaN weight also poisons every pixel within the cutoff, which for gaussian on any backend means the whole grid.

Expected behavior

One documented policy, the same on every backend. xrspatial.interpolate already has one (interpolate/_validation.py): drop non-finite points up front and raise when a non-empty input has no valid points left. kde and line_density should do the same, for coordinates and weights (and all four segment endpoints for line_density).

Reproduction (this host, worktree at b9ba8a8, CUDA available)

import numpy as np
import xarray as xr
import cupy
import dask.array as dask_array
from xrspatial.kde import kde, line_density

t = xr.DataArray(np.zeros((16, 16)), dims=['y', 'x'],
                 coords={'y': np.linspace(-4, 4, 16), 'x': np.linspace(-4, 4, 16)})
x = np.array([0.0, 1.0, np.nan])
y = np.array([0.0, 1.0, 0.5])

for name, tt in [('numpy', t),
                 ('cupy', t.copy(data=cupy.asarray(t.values))),
                 ('dask', t.copy(data=dask_array.from_array(t.values, chunks=(8, 8)))),
                 ('dask_cupy', t.copy(data=dask_array.from_array(cupy.asarray(t.values), chunks=(8, 8))))]:
    r = kde(x, y, bandwidth=1.0, kernel='gaussian', template=tt).data
    r = r.compute() if hasattr(r, 'compute') else r
    r = r.get() if hasattr(r, 'get') else r
    print(name, 'sum=%.4f' % r.sum(), 'n_nan=%d' % np.isnan(r).sum())

# auto-extent path
r = line_density([0., np.nan], [0., 0.], [1., 1.], [1., 1.], bandwidth=0.5, width=16, height=16)
print('auto-extent: sum=', float(r.sum()), 'coords finite:', bool(np.isfinite(r.x.values).all()))

Output:

numpy     sum=7.0278 n_nan=0
cupy      sum=nan    n_nan=256
dask      sum=7.0278 n_nan=0
dask_cupy sum=7.0278 n_nan=0
auto-extent: sum= 0.0 coords finite: False

The valid segment contributes nothing in the auto-extent case; the output is zeros with NaN coordinates.

Additional context

Found by the 2026-07-03 accuracy sweep of the kde module. Bandwidth selection (_silverman_bandwidth) also sees the NaN today via np.std/np.percentile; filtering up front fixes that as a side effect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    backend-coverageAdding missing dask/cupy/dask+cupy backend supportbugSomething isn't workinggpuCuPy / CUDA GPU supportinput-validationInput validation and error messagesseverity:highSweep finding: HIGHsweep-accuracyFound by /sweep-accuracy

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions