Configuration
Plugin maintains the configuration in the conf/base/vertexai.yaml
file. Sample configuration can be generated using kedro vertexai init
:
# Configuration used to run the pipeline
project_id: my-gcp-mlops-project
region: europe-west1
run_config:
# Name of the image to run as the pipeline steps
image: eu.gcr.io/my-gcp-mlops-project/example_model:${commit_id}
# Pull policy to be used for the steps. Use Always if you push the images
# on the same tag, or Never if you use only local images
image_pull_policy: IfNotPresent
# Location of Vertex AI GCS root
root: bucket_name/gcs_suffix
# Name of the kubeflow experiment to be created
experiment_name: MyExperiment
# Name of the scheduled run, templated with the schedule parameters
scheduled_run_name: MyExperimentRun
# Optional pipeline description
#description: "Very Important Pipeline"
# How long to keep underlying Argo workflow (together with pods and data
# volume after pipeline finishes) [in seconds]. Default: 1 week
ttl: 604800
# What Kedro pipeline should be run as the last step regardless of the
# pipeline status. Used to send notifications or raise the alerts
# on_exit_pipeline: notify_via_slack
# Optional section allowing adjustment of the resources
# reservations and limits for the nodes
resources:
# For nodes that require more RAM you can increase the "memory"
data_import_step:
memory: 2Gi
# Training nodes can utilize more than one CPU if the algoritm
# supports it
model_training:
cpu: 8
memory: 1Gi
# GPU-capable nodes can request 1 GPU slot
tensorflow_step:
nvidia.com/gpu: 1
# Default settings for the nodes
__default__:
cpu: 200m
memory: 64Mi
# Optional section allowing to generate config files at runtime,
# useful e.g. when you need to obtain credentials dynamically and store them in credentials.yaml
# but the credentials need to be refreshed per-node
# (which in case of Vertex AI would be a separate container / machine)
# Example:
# dynamic_config_providers:
# - cls: kedro_vertexai.auth.gcp.MLFlowGoogleOAuthCredentialsProvider
# params:
# client_id: iam-client-id
dynamic_config_providers: []
Dynamic configuration support
kedro-vertexai
contains hook that enables TemplatedConfigLoader.
It allows passing environment variables to configuration files. It reads all environment variables following KEDRO_CONFIG_<NAME>
pattern, which you
can later inject in configuration file using ${name}
syntax.
This feature is especially useful for keeping the executions of the pipelines isolated and traceable by dynamically setting output paths for intermediate data in the Data Catalog, e.g.
# ...
train_x:
type: pandas.CSVDataSet
filepath: gs://<bucket>/kedro-vertexai/${run_id}/05_model_input/train_x.csv
train_y:
type: pandas.CSVDataSet
filepath: gs://<bucket>/kedro-vertexai/${run_id}/05_model_input/train_y.csv
# ...
In this case, the ${run_id}
placeholder will be substituted by the unique run identifier from Vertex AI Pipelines.
There are two special variables KEDRO_CONFIG_COMMIT_ID
, KEDRO_CONFIG_BRANCH_NAME
with support specifying default when variable is not set,
e.g. ${commit_id|dirty}
Disabling dynamic configuration hook
In current Kedro versions (<=0.18
) only single configuration hook can be attached, which means if your project had a custom one, this plug-in will most likely overwrite it. You can disable this plugin’s configuration hook by setting environment variable KEDRO_VERTEXAI_DISABLE_CONFIG_HOOK
to true
, e.g.:
export KEDRO_VERTEXAI_DISABLE_CONFIG_HOOK=true
Once set, the plugin will provide a clear warning with a reminder:
KEDRO_VERTEXAI_DISABLE_CONFIG_HOOK environment variable is set and EnvTemplatedConfigLoader will not be used which means formatted config values like ${run_id} will not be substituted at runtime
To make plugin-compatible custom config loader you can extend the class kedro_vertexai.context_helper.EnvTemplatedConfigLoader
and register your own hook.
Dynamic config providers
When running the job in VertexAI it’s possible to generate new configuration files at runtime if that’s required, one example could be generating Kedro credentials on a Vertex AI’s node level (the opposite would be supplying the credentials when starting the job).
Example:
run_config:
# ...
dynamic_config_providers:
- cls: kedro_vertexai.auth.gcp.MLFlowGoogleOAuthCredentialsProvider
params:
client_id: iam-client-id
The cls
fields should contain a fully qualified reference to a class implementing abstract kedro_vertexai.dynamic_config.DynamicConfigProvider
. All params
will be passed as kwargs
to the class’s constructor.
Two required methods are:
@property
def target_config_file(self) -> str:
return "name-of-the-config-file.yml"
def generate_config(self) -> dict:
return {
"layout": {
"of-the-target": {
"config-file": "value"
}
}
}
First one - target_config_file
should return the name of the configuration file to be generated (e.g. credentials.yml
) and the generate_config
should return a dictionary, which will be then serialized into the target file as YAML. If the target file already exists during the invocation, it will be merged (see method kedro_vertexai.dynamic_config.DynamicConfigProvider.merge_with_existing
) with the existing one and then saved again.
Note that the generate_config
has access to an initialized plugin config via self.config
property, so any values from the vertexai.yml
configuration is accessible.