Persistent Datasets just got even better. Not only can you use the same dataset across many jobs in parallel at no additional charge, now you can attach multiple datasets to a single job for free. If that wasn't enough, you can now dynamically change the datasets attached to any notebook job as your needs evolve through the model development process. Additionally, more options have been added for job base environments, allowing you to save time and storage quota by using specific versions of popular frameworks.
Datasets
Adding Multiple Datasets
When creating a new notebook or training job, in the Data
section of the job form, click the Add Dataset
button. To add a public dataset select Public Dataset
from the Dataset Type
field and then select the dataset from the Dataset
field. To add a private dataset select My Dataset
from the Dataset Type
field and then select the dataset from the Dataset
field. Continue clicking the Add Dataset
button to add additional datasets until all required datasets have been selected.
If you add a single dataset, the dataset will be mounted to /opt/ml/input
inside the job workers. If you add more than one dataset, each dataset will be mounted into its own directory inside the /opt/ml/input
directory. The directory name will be the name of the dataset, all lower case with spaces converted to underscores. For example, the PASCAL VOC
dataset will be mounted to /opt/ml/input/pascal_voc
if it is one of multiple datasets selected.
As before, there is no additional charge to attach a dataset to a job. You only incur storage charges once for each private dataset, regardless of how many jobs it is used in. Public datasets are always free to use.
Changing Datasets For Existing Jobs
To edit the datasets for existing notebook jobs, the job must first be in the stopped
status. Once the job is stopped, select it from the list and click the Edit
button. Expand the Data
section of the edit form to see the currently enabled datasets. Edit this list in the same manner you did to create the job and click Confirm
. The job may enter the status of updating
while it prepares your datasets. Once the job status returns to stopped
, you can restart the job and the new datasets will be attached.
New Environment Options
Job environments determine the software the is preinstalled in the operating environment for the job workers. Now, in addition to our pre-built deep learning environments, you can now select environments with specific versions of popular deep learning frameworks. Selecting the correct environment will save you time setting up your environment and minimize the amount of space required for each worker. The space required by the base environment is free and does not count towards your storage quota, but modifications to the environment will. If your model needs a specific version of PyTorch or Tensorflow, be sure to select the version specific environment, since downgrading PyTorch or Tensorflow can consume a large amount of space (10GB+).
The Base Environment
option in the Environment
section of the job form has now been expanded into 3 separate fields.
-
Python Version
: The Python version of the conda environment that forms the base of the environment. All base environments contain a wide variety of popular data science, machine learning, and GPU-acceleration libraries. Only Python 3.6 and 3.7 environments are currently available. -
Framework
: The primary deep learning framework to be used. If you do not have specific version requirements for your model, selectDeep Learning
. Otherwise, select the major framework you intend to use to see the available versions.
Deep Learning
: All supported frameworks are installed using their latest version compatible with the Python version selected: Tensorflow, PyTorch, MXNet.PyTorch
: Select this option if your model code requires a specific version of PyTorchTensorflow
: Select this option if your model code requires a specific version of TensorflowMXNet
: Select this option if your model code requires a specific version of MXNet
Framework Version
: The version of major framework selected.
These new options are available as environment selection for both notebook and training jobs, and you can select these new environments as the default environment for each job type on the Account Settings page.