ast-grep (sg) patterns¶
tilth_search covers names and text. For shapes with metavariables โ "any
call to JSON.parse(JSON.stringify(โฆ))", "any for loop with time.Sleep in
its body" โ drop to sg (ast-grep) via Bash. This is the only sanctioned
shell escape from cheez-search. The same escape covers structural codemods
via sg --rewrite (see "Structural codemods" below); tilth_edit remains
the default for one-off block edits.
When sg is the right pick¶
- The pattern needs metavars (
$X,$$$BODY) or specific node kinds. - You're surveying a structural shape across a directory (anti-pattern sweeps, refactor previews, lint-style scans).
- Tree-sitter symbol search would over-match because the name isn't fixed.
- You want to apply the same structural change across many files in one pass (codemod) โ see the dedicated section below.
If the question is "where is handleAuth defined" or "what calls
validateToken", stay in tilth_search. sg is for shape, not name.
Pattern syntax¶
# AST template: $X is a metavar that matches any single node.
sg --lang typescript -p 'JSON.parse(JSON.stringify($X))' --json src/
# $$$BODY matches a sequence of statements.
sg --lang rust -p 'impl std::fmt::Display for $TYPE { $$$BODY }' --json src/
# Bound the scan; never splice unvalidated user input as the path.
SCOPE=$(realpath "$SCOPE_INPUT")
sg --lang python -p 're.match($PATTERN, $INPUT)' --json "$SCOPE"
Hard rules for sg invocations¶
- Always pass
--lang/-l. Pattern parsing is language-specific; the same pattern string can parse differently per language. Don't rely on extension inference for anything that goes into a script. - Always pass
--jsonwhen piping into a tool, an LLM, or a follow-up shell step. Never parse the pretty TTY output. - Never pass
--interactive. It requires a TTY and human input; it will hang or fail in agent contexts. - Never pass
-U(rewrite-update) without a dry run first. See "Structural codemods" โ search-only invocation, then re-run with-Uafter the diff is staged or confirmed. - Validate any path that flows from user input before splicing it into
the command line. Reject
;,&,|, backtick,$(,>,<, newline. Resolve to an absolute path withrealpathand confirm it sits under the repo root. - Parse
--jsondefensively. Key by field name (text,file,range,metaVariables); tolerate missing keys; do not hard-code positionaljqpaths. The shape has shifted between minor releases. - Filter test/build/vendor directories with
--globsor by post-filtering JSON.
Pitfalls¶
- CST, not AST. ast-grep matches the Concrete Syntax Tree, so trivia and punctuation can affect what nodes exist. The pattern itself must be valid syntax in the target language โ "didn't match" is often "didn't parse on the pattern side".
- Metavar binding. Repeating the same metavar name within a single
pattern (
$VAR ... $VAR) requires both occurrences to match the same node text. Use distinct names ($A,$B) when you want independent matches; reuse the name only when you want the binding. - Lenient by default. ast-grep matches loosely on CST nodes. Pass
--strict(orstrictness:in YAML rules) when you need exact node-kind equality and the loose match is producing surprises. - No semantics.
$X.unwrap()is text-shape only โ the receiver's type is unknown to ast-grep. Anything that needs type info, control flow, or data flow has to escalate (see "When to escalate further").
Performance: narrow first, parse second¶
ast-grep parses every file in scope, so cost scales with file count, not
match count. A wide **/* over a monorepo can be ~10ร slower than a
constrained scope. Two-stage workflow:
- List candidates with
tilth_search(definitions, imports, content hits) ortilth_files(glob/path predicates) to get the file set. - Run
sgwith--globsor explicit paths bounded to that set.
This pattern beats one giant wildcard scan and keeps the JSON output small enough to inspect.
Common pattern shapes¶
| Goal | Pattern |
|---|---|
| Calls to a method on any receiver | $RECV.someMethod($$$ARGS) |
| Anti-pattern: deep clone via JSON | JSON.parse(JSON.stringify($X)) |
| Empty catch blocks | try { $$$BODY } catch ($E) { } |
| Sleep inside a loop | for ($$$INIT) { $$$BEFORE time.Sleep($D) $$$AFTER } |
| Trait impls for a specific trait | impl Display for $TYPE { $$$BODY } |
| React hook destructure | const [$STATE, $SETTER] = useState($$$INIT) |
| Async function declaration | async function $NAME($$$PARAMS) { $$$BODY } |
| Class with any body | class $NAME { $$$BODY } |
Structural codemods (sg --rewrite)¶
sg --rewrite '<template>' plus -U (update files in place) is the
sanctioned codemod path. It complements tilth_edit rather than replacing
it: tilth_edit excels at "replace this specific block in this specific file"
with hash-anchor concurrency safety; sg --rewrite excels at "rewrite every
instance of this shape across the repo" with structural-match safety.
Use it when:
- The change repeats across N locations and the surrounding text varies.
- The pattern can be expressed structurally (metavars carry the variable parts; the rewrite template fills them back in).
- You don't know all the locations a priori and want discovery + change in one pass.
Dry-run-first protocol (non-negotiable):
- Run the pattern without
-U. Inspect--jsonmatches; count them. - Confirm the working tree is clean or stashed. Mass rewrites that bury uncommitted work are unrecoverable without a stash.
- Re-run with
-Uto apply. - Diff the result (
git diff). Commit or revert as a unit; do not layer additional changes on top until the codemod is reviewed. - If matches > expected, bail and revert โ your pattern was too loose.
Tighten with
--strict,--globs, or distinct metavar names, then repeat from step 1.
sg --rewrite does not have hash-anchor safety. Structural matching catches
"didn't accidentally rewrite a string literal that looks like code"; it does
not catch "file changed under me mid-run". Treat it as a single transactional
change between two clean git states.
When to escalate further¶
sg does not implement type information, control-flow analysis, data-flow
analysis, taint analysis, or constant propagation. If the rule needs any of
those, escalate to a real linter or a security tool:
- ESLint (
@typescript-eslint) โ type-aware JS/TS rules, plugin ecosystem, editor integration. - Clippy โ Rust lints with full type inference and borrow-checker
awareness.
sgcan do shape-level Rust patterns but not lifetime semantics. - Ruff โ Python lint/format with thousands of vetted rules covering what isort/flake8/pyupgrade/black do.
- gosec โ Go security rule pack.
- Semgrep โ when you need taint analysis, security rule packs, or
composition primitives like
pattern-not-insideandpattern-eitherthat ast-grep does not provide.
If your sg patterns are growing chained metavars and nested negations,
or you're hand-rolling logic that overlaps a maintained rule set, stop and
adopt the linter. sg is for one-off structural sweeps and
repo-specific codemods, not for maintained rule infrastructure.