After you’ve chosen the general idea for your task, the next step is to write out a more detailed specification. This might include:

  • The basic idea of the task, what it’s designed to test, etc.
  • How to setup the task environment
    • What resources are locally available
    • What external resources the agent has access to
    • Prompt explaining the task to the agent
  • Outline of the steps required to complete the task
    • The type of abilities involved
    • Particular bottlenecks or hard steps
    • Time and cost estimates for a human to do the task
  • Scoring
    • Manual or automatic scoring?
      • If manual, how long would it take?
    • What is the scoring rubric?
  • Oversight or manual simulation requirement


Anthropic’s Responsible Scaling Policy includes specifications for several evaluation tasks, starting on page 16. Note that we think an ideal specification would have a little bit more detail than these.

Getting feedback on specifications

When you have a specification that you’d like to move forward with, you can submit it to METR using the task proposal form. We’ll respond in a few days with feedback on your submission and whether we think it’s ready to implement. We recommend doing this at least once for each task you develop, because it will provide an opportunity to address any design issues before spending a lot of time on the implementation.