pre-commit currently passes selected filenames to hooks via argv.
For large changesets (or --all-files), argv length limits are hit and
filenames are partitioned, causing multiple hook invocations.
This means there is currently no built-in way to pass filenames to an
underlying hook in one shot without chunking / re-running. The only practical
workaround is to set pass_filenames: false and run custom git operations in
hook code to reconstruct the file set, which is expensive and duplicates
pre-commit's own file-selection logic.
This change adds a hook option:
pass_filenames_via_stdin: true
When enabled, pre-commit sends filenames as NUL-delimited bytes on stdin and
runs the hook in a single invocation (no argv chunking).
Why NUL-delimited stdin:
- safe for filenames containing spaces/newlines
- matches established -0 conventions in unix tooling
Usage for hook authors:
- shell:
while IFS= read -r -d '' filename; do
...
done
- python:
data = sys.stdin.buffer.read()
filenames = [os.fsdecode(p) for p in data.split(b'\0') if p]
Behavior notes:
- default remains argv-based passing
- pass_filenames: false still disables filename passing entirely
Implementation includes schema/runtime wiring, shared NUL encode/decode
helpers, and tests covering defaulting and runtime behavior.
Writing a test for this one is tricky, because I was seeing the issue
only when the directory being removed is a docker volume, so instead of
getting EACCES we get EPERM.
This is easy to reproduce though. The existing test fails when the
directory being used for the files is a docker volume:
```
% docker run \
-v $(mktemp -d):/tmp \
-v ${PWD}:/src \
-w /src \
python:3 \
bash -c 'pip install -e . && pip install -r requirements-dev.txt && python -m pytest tests/util_test.py'
```
If rev is wrapped in single or double quotes (e.g. due to a yamllint quoted-strings rule), when
re-writing the rev to update it, honour the existing quotation style
Before there was a `getcwd` syscall for every filename which was filtered.
Instead this is now cached per-run.
- When all files are identified by filename only: ~45% improvement
- When no files are identified by filename only: ~55% improvement
This makes little difference to overall execution, the bigger win is
eliminating the `memoize_by_cwd` hack. Just removing the memoization would
have *increased* the runtime by 300-500%.