How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals | Towards Data Science

Set up and run the GPQA-Diamond benchmark on DeepSeek-R1’s distilled models locally to evaluate its reasoning capabilities.

By · · 1 min read
How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals | Towards Data Science

Source: Towards Data Science

Set up and run the GPQA-Diamond benchmark on DeepSeek-R1’s distilled models locally to evaluate its reasoning capabilities.