Setup

ℹ️

We’ve paused the bounty program to new submissions indefinitely. Read our announcement →

You should use Vivaria to develop tasks. You can find setup instructions for Vivaria here.

Once you’ve installed Vivaria, head over to our example tasks on GitHub. These examples demonstrate how to develop a basic task.

Running a task

You should follow the instructions for uploading your task to Vivaria.

When the agent completes a task, it submits a final answer for scoring. You can simulate this by writing to the file /home/agent/submission.txt.

echo "3580245" > /home/agent/submission.txt

Go back to the first terminal to see the final score for the task. If you gave the correct answer, it should be 1.0!

ℹ️

The method for receiving instructions and submitting an answer will vary depending on how a task is run. To ensure compatibility, do not refer to instructions.txt or submission.txt in your task. Instead, write instructions in TaskFamily.get_instructions and advise the agent to “submit” its solution. The agent will know how to do this.

Testing a task

A task family may contain one or more automated tests (e.g. test_my_task.py) that use the pytest framework. You can run these using the same method as running the task, except you should run viv task test instead of viv task start.

Desiderata Ideas