Sophisticated reducers like C-Reduce do know things like that parens go in pairs. C-Reduce has many transformation operations, and while some are syntax agnostic (delete a few characters or tokens), others use Clang to try to parse the input as C++ and transform the AST.
Perses isn't language agnostic, it just knows the syntax of a lot of languages because there are antlr grammars for most commonly used languages.
Really there's no such thing as a language-agnostic test-case reducer. shrink ray is much closer than most, but all this means is that it's got some heuristics that work well for a wide variety of common languages (e.g. the bracket balancing thing). It's also got a bunch of language-specific passes.
This is sortof inherent to the problem, because in order to get good results and good performance, a test-case reducer has to have a strong idea of what sort of transformations are likely to work, which in turn means it has to have a strong idea of what sort of languages it's likely to be run on.
I am just shooting in the dark here so excuse me if my comment is too ignorant: have you considered rolling your own reducer and use TreeSitter grammars for it?
Perses is a reducer that works on an AST and claims to only try syntactically valid candidate inputs. It also claims to be language agnostic, and I don't know how these two things go together. But it does work nicely for Java for me. https://github.com/uw-pluverse/perses / https://faculty.cc.gatech.edu/~qzhang414/papers/icse18_cheng...