add pass_filenames_via_stdin for large changesets

pre-commit currently passes selected filenames to hooks via argv.
For large changesets (or --all-files), argv length limits are hit and
filenames are partitioned, causing multiple hook invocations.

This means there is currently no built-in way to pass filenames to an
underlying hook in one shot without chunking / re-running. The only practical
workaround is to set pass_filenames: false and run custom git operations in
hook code to reconstruct the file set, which is expensive and duplicates
pre-commit's own file-selection logic.

This change adds a hook option:

    pass_filenames_via_stdin: true

When enabled, pre-commit sends filenames as NUL-delimited bytes on stdin and
runs the hook in a single invocation (no argv chunking).

Why NUL-delimited stdin:
- safe for filenames containing spaces/newlines
- matches established -0 conventions in unix tooling

Usage for hook authors:
- shell:

    while IFS= read -r -d '' filename; do
        ...
    done

- python:

    data = sys.stdin.buffer.read()
    filenames = [os.fsdecode(p) for p in data.split(b'\0') if p]

Behavior notes:
- default remains argv-based passing
- pass_filenames: false still disables filename passing entirely

Implementation includes schema/runtime wiring, shared NUL encode/decode
helpers, and tests covering defaulting and runtime behavior.
This commit is contained in:
Sharmila Jesupaul 2026-02-18 18:01:55 -08:00
parent 8416413a0e
commit 635912514d
18 changed files with 147 additions and 2 deletions

View file

@ -185,8 +185,10 @@ def _run_single_hook(
# print hook and dots first in case the hook takes a while to run
output.write(_start_msg(start=hook.name, end_len=6, cols=cols))
pass_filenames_via_stdin = hook.pass_filenames_via_stdin
if not hook.pass_filenames:
filenames = ()
pass_filenames_via_stdin = False
time_before = time.monotonic()
language = languages[hook.language]
with language.in_env(hook.prefix, hook.language_version):
@ -198,6 +200,7 @@ def _run_single_hook(
is_local=hook.src == 'local',
require_serial=hook.require_serial,
color=use_color,
pass_filenames_via_stdin=pass_filenames_via_stdin,
)
duration = round(time.monotonic() - time_before, 2) or 0
diff_after = _get_diff()