pre-commit/pre_commit/hook.py
Sharmila Jesupaul 635912514d add pass_filenames_via_stdin for large changesets
pre-commit currently passes selected filenames to hooks via argv.
For large changesets (or --all-files), argv length limits are hit and
filenames are partitioned, causing multiple hook invocations.

This means there is currently no built-in way to pass filenames to an
underlying hook in one shot without chunking / re-running. The only practical
workaround is to set pass_filenames: false and run custom git operations in
hook code to reconstruct the file set, which is expensive and duplicates
pre-commit's own file-selection logic.

This change adds a hook option:

    pass_filenames_via_stdin: true

When enabled, pre-commit sends filenames as NUL-delimited bytes on stdin and
runs the hook in a single invocation (no argv chunking).

Why NUL-delimited stdin:
- safe for filenames containing spaces/newlines
- matches established -0 conventions in unix tooling

Usage for hook authors:
- shell:

    while IFS= read -r -d '' filename; do
        ...
    done

- python:

    data = sys.stdin.buffer.read()
    filenames = [os.fsdecode(p) for p in data.split(b'\0') if p]

Behavior notes:
- default remains argv-based passing
- pass_filenames: false still disables filename passing entirely

Implementation includes schema/runtime wiring, shared NUL encode/decode
helpers, and tests covering defaulting and runtime behavior.
2026-02-18 18:06:34 -08:00

61 lines
1.5 KiB
Python

from __future__ import annotations
import logging
from collections.abc import Sequence
from typing import Any
from typing import NamedTuple
from pre_commit.prefix import Prefix
logger = logging.getLogger('pre_commit')
class Hook(NamedTuple):
src: str
prefix: Prefix
id: str
name: str
entry: str
language: str
alias: str
files: str
exclude: str
types: Sequence[str]
types_or: Sequence[str]
exclude_types: Sequence[str]
additional_dependencies: Sequence[str]
args: Sequence[str]
always_run: bool
fail_fast: bool
pass_filenames: bool
pass_filenames_via_stdin: bool
description: str
language_version: str
log_file: str
minimum_pre_commit_version: str
require_serial: bool
stages: Sequence[str]
verbose: bool
@property
def install_key(self) -> tuple[Prefix, str, str, tuple[str, ...]]:
return (
self.prefix,
self.language,
self.language_version,
tuple(self.additional_dependencies),
)
@classmethod
def create(cls, src: str, prefix: Prefix, dct: dict[str, Any]) -> Hook:
# TODO: have cfgv do this (?)
extra_keys = set(dct) - _KEYS
if extra_keys:
logger.warning(
f'Unexpected key(s) present on {src} => {dct["id"]}: '
f'{", ".join(sorted(extra_keys))}',
)
return cls(src=src, prefix=prefix, **{k: dct[k] for k in _KEYS})
_KEYS = frozenset(set(Hook._fields) - {'src', 'prefix'})