[Java][IPC] AbstractCompressionCodec.compress() writes prefix=0 for empty buffers, incompatible with C++/Python readers


### Describe the bug

When a buffer has `writerIndex == 0` (e.g., a string column where all values
are empty string ""), `AbstractCompressionCodec.compress()` writes an 8-byte
buffer with `uncompressed_length = 0` as a "shortcut for empty buffer":

https://github.com/apache/arrow-java/blob/main/vector/src/main/java/org/apache/arrow/vector/compression/AbstractCompressionCodec.java#L32-L39

```java
if (uncompressedBuffer.writerIndex() == 0L) {
    // shortcut for empty buffer
    compressedBuffer.setLong(0, 0);  // prefix = 0
    ...
}
```

This has been present since the initial implementation (ARROW-11899, 2021).

The Java `decompress()` handles this correctly (`if size==0 return empty`),
but **C++ and Python Arrow readers do not recognize `prefix=0`**. They attempt
to decompress 0 bytes of data, which fails:

- C++ (Arrow 1.0.0 ~ latest): `IOError: Lz4 compressed input contains less than one frame`
- PyArrow 21.0: same error

### Reproduction

Write an Arrow IPC stream with LZ4_FRAME (or ZSTD) compression where **one
string column has all values = ""** (empty string, not null). The string data
buffer has `writerIndex = 0`, triggering the empty buffer path.

```java
// Writer
ArrowStreamWriter writer = new ArrowStreamWriter(root, null, channel,
    IpcOption.DEFAULT, CommonsCompressionFactory.INSTANCE, CodecType.LZ4_FRAME);

// All rows: stringVector.setSafe(i, "".getBytes());
```

Reading with C++ or Python fails at the first RecordBatch.

### Root cause

The Arrow IPC compression format defines:
- `prefix > 0`: compressed data follows, decompress to `prefix` bytes
- `prefix = -1`: buffer stored uncompressed (sentinel)
- `prefix = 0`: **undefined** — not in spec, not handled by C++/Python

Java writes `prefix=0` for empty buffers, but only Java itself knows how to
read it back. C++/Python treat it as "0 bytes to decompress" → fail.

### Suggested fix

Change the empty buffer path to use `-1` sentinel (which all readers support):

```java
if (uncompressedBuffer.writerIndex() == 0L) {
    ArrowBuf compressedBuffer = allocator.buffer(SIZE_OF_UNCOMPRESSED_LENGTH);
    compressedBuffer.setLong(0, -1L);  // Use -1 instead of 0
    compressedBuffer.writerIndex(SIZE_OF_UNCOMPRESSED_LENGTH);
    uncompressedBuffer.close();
    return compressedBuffer;
}
```

When a reader sees `prefix=-1`, it returns an empty/zero-length slice — which
is correct for an originally empty buffer.

### Environment

- Affected: All Arrow Java versions with IPC compression (1.0.0+)
- Readers that fail: Arrow C++ (all versions), PyArrow (all versions)
- Codec: Both LZ4_FRAME and ZSTD

### Related

- #1116 — similar symptom (prefix=0) but different root cause (race condition
  in vector reuse, not the intentional empty buffer path)
- apache/arrow#15102 — C++ DecompressBuffer fix for prefix=-1 (does not
  handle prefix=0)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Java][IPC] AbstractCompressionCodec.compress() writes prefix=0 for empty buffers, incompatible with C++/Python readers #1196

Describe the bug

Reproduction

Root cause

Suggested fix

Environment

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Java][IPC] AbstractCompressionCodec.compress() writes prefix=0 for empty buffers, incompatible with C++/Python readers #1196

Description

Describe the bug

Reproduction

Root cause

Suggested fix

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions