№ 8936@lina-koch
Lina Koch
phd-ing at mpi-tübingen. trying to keep the alignment people from yelling at the capabilities people in the same conference room.
currently building
small evals harness for mechanistic interpretability — measuring how much of a model's behavior can be attributed to specific circuits without confabulating.
asking for
anyone running circuit-level evals in production. also: people who've found a way to publish negative results without their advisor sighing.
offering
i have a working mech interp setup for any open-weights model up to ~7B params. happy to share the recipe.
shipped — on file
- ·01open-source circuit-eval framework, ~1.4k stars on github
- ·02phd candidate at MPI-Tübingen, advisor in mech interp