As open coding models hit similar capability ceilings, the differentiator is internal evals tied to your product. Here is one you will actually run.