Auxiliary Virtual Machines
Some tasks are not easily containerizable, for example:
- The task of breaking out of a Docker container
- The task of installing and configuring a Kubernetes cluster
- Tasks that require setting up networks of machines using Docker Compose
- Machine learning tasks that give the agent access to GPUs in a safely sandboxed manner
To support these tasks, the Task Standard includes a feature called “auxiliary virtual machines” or “aux VMs”. These VMs start alongside the main task container and run on a cloud computing service. Your task code, which runs in the main Docker container, can run commands on the aux VM via SSH. It can also grant SSH access to the agent, either as an administrator or with restrictions.
To use an aux VM, your task family needs 3 things: required environment variables, internet permissions, and a VM spec. Here’s an example of a task family that uses an aux VM with a GPU:
class TaskFamily:
standard_version = "0.2.3"
# ...
# Tell the workbench to set admin credentials for the VM as environment variables
# Note: these environment variables will not be available to the agent
required_environment_variables = [
"VM_SSH_USERNAME",
"VM_SSH_PRIVATE_KEY",
"VM_IP_ADDRESS",
]
# Aux VMs are connected over the internet, so the task needs to have "full_internet" permission
@staticmethod
def get_permissions(t: Task) -> list[str]:
return ["full_internet"]
# Specify what kind of VM the task needs and how to set it up
# Different tasks in a family can have different VM specs
@staticmethod
def get_aux_vm_spec(t: Task) -> VMSpec:
return VMSpec(
cpu_count_range=(4, 4),
cpu_architecture="x64",
ram_gib_range=(16, 16),
gpu_spec=GPUSpec(count_range=(1, 1), model="a10"),
base_image_type="ubuntu-20.04-cuda",
build_steps=[
{
"type": "file",
"source": "assets/starter_code.py",
"destination": "/home/ubuntu/starter_code.py"
},
{
"type": "shell",
"commands": [
"python3 -m pip install torch"
]
}
]
)
AWS setup
The task workbench uses AWS to run aux VMs. If your task uses aux VMs, you’ll need to follow these steps before you can run it on the workbench.
- Sign up for an AWS account if you don’t already have one. Then make sure you’ve installed the AWS CLI and Packer with the Amazon plugin.
- Create an IAM user in the AWS console. Choose “attach policies directly” and create a new policy. Use the JSON configuration from here.
- Create an access key for the new IAM user.
- Run
aws configure
in your terminal and paste in the access key and secret access key you just created. - Make sure that the VM spec you request matches an available EC2 instance type and base image.
- Now, if you run a task that requires an aux VM, the workbench will automatically start one for you. Make sure to monitor your AWS console and terminate any instances you no longer need.
Build steps
By default, every aux VM starts from a regular Debian or Ubuntu Linux image. You can customize the setup process by adding build steps to the aux VM spec. These steps will be run using Packer. There are two types of build steps, file
and shell
, which correspond to the File Provisioner and Shell Provisioner respectively.
After the build steps finish, the resulting image is cached so it can be re-used for future runs. Images are automatically re-built if there are changes to the build steps (or files referenced by a build step) since the last run.
start()
method using SSH.Interacting with aux VMs
The preferred way for task code to interact with an aux VM is SSH using the Paramiko library. You can use the helper function common.aux_vm_access.ssh_client()
to get an SSHClient
instance connected as the admin user.
Running commands
The SSHClient.exec_command()
method allows you to run commands on the aux VM. The method returns a 3-tuple of (stdin, stdout, stderr)
, which are binary file-like objects representing the input and output streams of the command.
Note that exec_command()
returns immediately after the command has started. To wait for the command to finish, call stdout.channel.recv_exit_status()
, which will block until the command exits.
Uploading and downloading files
If you need to upload files to or download files from the aux VM after the task has started, Paramiko supports SFTP. You can get an SFTP client from an existing SSH client using SSHClient.open_sftp()
.
Executing multiple SFTP operations is slow, so if you want to download or upload many files at once, it will be more efficient to compress the files into a tar or ZIP archive before transferring them.
Scoring output on the aux VM
If the agent creates files on the aux VM that you need to use in your scoring function, then you can do one of the following in the TaskFamily.score()
:
- Copy the assets from the aux VM to the task environment using SFTP, or
- Run a scoring script on the aux VM using
exec_command()
and capture its output using thestdout
/stderr
streams.
We recommend copying assets used for scoring from the aux VM to the task environment, as it will likely be easier to write and debug your scoring code if it runs locally to the task environment. However, if scoring must run on a GPU or needs to inspect the state of the aux VM, it may be better to run a scoring script on the aux VM instead.
Letting agents access the aux VM
By default, the agent has no way to access the aux VM. You can use the helper function common.aux_vm_access.setup_agent_ssh()
to generate an SSH command that the agent can use for this purpose. The command will be saved to /home/agent/ssh_command
in the main Docker container. Mention this file in the instructions so that the agent knows where to find it.
Tips, pitfalls, and known issues
Installing packages
- All package management commands must be run using
sudo
. apt-get
may pause for user confirmation when installing packages, which will cause the build process to hang. To avoid this:- Ensure that the first command in any
shell
build step that uses a package management tool isexport DEBIAN_FRONTEND=noninteractive
- Add the option
-y
to every package management command you run (e.g.apt-get install -y pkg_name
) to automatically answer “yes” to any question the command asks
- Ensure that the first command in any
- Run
sudo apt-get update -y
before installing packages to ensure that you have up-to-date package lists. - On startup, the AWS
cloud-init
process will refresh package sources. If your build steps start beforecloud-init
finishes, then package installation may fail becausecloud-init
has not finished configuring the available package sources. You can prevent this from happening by runningcloud-init status --wait
, which will wait untilcloud-init
has finished. - During start-up, other processes may try to install or update packages, which can cause package management commands to fail if another process has locked the package management database. To avoid this:
- Add the option
-o DPkg::Lock::Timeout=600
to every package management command you run (e.g.apt-get install -o DPkg::Lock::Timeout=600 pkg_name
) so that the command waits up to 10 minutes for other processes to release the package management lock, rather than failing immediately. - Ensure that
unattended-upgrades
is not running, as this may install packages at unpredictable times. To do so, runsudo systemctl stop unattended-upgrades
and thensudo apt-get purge -y -o DPkg::Lock::Timeout=600 unattended-upgrades
to remove it.
- Add the option
Python
Aux VMs may have more than one version of Python installed, and python
may point to an older version.
Changing the default python
may break system tools that rely on an older version. Instead:
- Identify the command that will run the latest version of Python, if that is different from the default
python
command. Onubuntu-20.04-cuda
this command ispython3.9
. - When installing packages using
pip
, make sure to invoke it as a module (e.g.python3.9 -m pip
). - If applicable, tell the agent which Python command to use in the instructions.
Permissions
The build steps, and any actions you initiate on the aux VM via SSH, will run as an admin user. The name of the admin user will depend on the base image you use, and will be stored in the VM_SSH_USERNAME
environment variable.
- If you run a command that needs root privileges, use
sudo
. - Unlike the main Docker container, the aux VM will not have an
agent
user by default. If you’d like to create one, addcommon.aux_vm_access.create_agent_user_step()
to your aux VM’s build steps. - When uploading files to the aux VM, you can only upload to folders where the admin user has write permissions, and those files will be owned by the admin user. If you’d like to put files in the agent’s home directory, upload them to the admin user’s home directory, then use
sudo mv
andsudo chown
.