Structure of hub_validations class objects
The high level validate_*()
family of functions all
return a <hub_validations>
S3 class object.
Structure of <hub_validations>
A hub_validations
object is effectively a list and
represents the collected output of the series of checks performed by a
higher level validate_*()
Each named element of the list contains the result of an individual
check and inherits from subclass <hub_check>
. The
name of each element is the name of the check.
Let’s examine an example output of a model output file validation
using validate_submission()
hub_path <- system.file("testhubs/simple", package = "hubValidations")
v <- validate_submission(hub_path,
file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
str(v, max.level = 1)
#> Classes 'hub_validations', 'list' hidden list of 20
#> $ valid_config :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_exists :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_name :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_location :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ round_id_valid :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_format :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ metadata_exists :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_read :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ valid_round_id_col:List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ unique_round_id :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ match_round_id :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ colnames :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ col_types :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ valid_vals :List of 5
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ rows_unique :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ req_vals :List of 5
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ value_col_valid :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ value_col_non_desc:List of 5
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ value_col_sum1 :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_info" "hub_check" "rlang_message" "message" ...
#> $ submission_time :List of 6
#> ..- attr(*, "class")= chr [1:5] "check_failure" "hub_check" "rlang_error" "error" ...
The super class returned in each element depends on the status of the check:
If a check succeeds, a
condition class object is returned.If a check is skipped, a
condition class object is returned.-
Checks vary with respect to whether they return an
condition class object if the check fails.-
class objects indicate a check that failed but does not affect downstream checks so validation was able to proceed. -
class objects indicate early termination of the validation process because of failure of a check downstream checks depend on.
Ultimately, both will cause overall validation to fail. The
class exists to alert you to the
fact that there may be more errors not yet reported due to
early termination of the check process.
print method
objects have their own print method
which displays the result, the check name and message of each check:
indicates a check was successful (a<message/check_success>
condition class object was returned) -
indicates a check failed but, because it does not affect downstream checks, validation was able to proceed (a<error/check_failure>
condition class object was returned) -
indicates a check that downstream checks depend on failed, causing early return of the validation process (a<error/check_error>
condition class object was returned) -
indicates an execution error occured and the check was not able to complete (a<error/check_exec_error>
condition class object was returned). Will cause early return if expected check failure output was a<error/check_error>
. -
indicates a check was skipped (a<message/check_info>
condition class object was returned)
#> ── simple ────
#> ✔ [valid_config]: All hub config files are valid.
#> ── 2022-10-08-team1-goodmodel.csv ────
#> ✔ [file_exists]: File exists at path
#> model-output/team1-goodmodel/2022-10-08-team1-goodmodel.csv.
#> ✔ [file_name]: File name "2022-10-08-team1-goodmodel.csv" is valid.
#> ✔ [file_location]: File directory name matches `model_id` metadata in file
#> name.
#> ✔ [round_id_valid]: `round_id` is valid.
#> ✔ [file_format]: File is accepted hub format.
#> ✔ [metadata_exists]: Metadata file exists at path
#> model-metadata/team1-goodmodel.yaml.
#> ✔ [file_read]: File could be read successfully.
#> ✔ [valid_round_id_col]: `round_id_col` name is valid.
#> ✔ [unique_round_id]: `round_id` column "origin_date" contains a single, unique
#> round ID value.
#> ✔ [match_round_id]: All `round_id_col` "origin_date" values match submission
#> `round_id` from file name.
#> ✔ [colnames]: Column names are consistent with expected round task IDs and std
#> column names.
#> ✔ [col_types]: Column data types match hub schema.
#> ✔ [valid_vals]: `tbl` contains valid values/value combinations.
#> ✔ [rows_unique]: All combinations of task ID
#> column/`output_type`/`output_type_id` values are unique.
#> ✔ [req_vals]: Required task ID/output type/output type ID combinations all
#> present.
#> ✔ [value_col_valid]: Values in column `value` all valid with respect to
#> modeling task config.
#> ✔ [value_col_non_desc]: Values in `value` column are non-decreasing as
#> output_type_ids increase for all unique task ID value/output type
#> combinations of quantile or cdf output types.
#> ℹ [value_col_sum1]: No pmf output types to check for sum of 1. Check skipped.
#> ✖ [submission_time]: Submission time must be within accepted submission window
#> for round. Current time "2024-09-05 08:00:53 UTC" is outside window
#> 2022-10-02 EDT--2022-10-09 23:59:59 EDT.
Note that the submission window check is always performed and reported last.
Structure of a <hub_check>
Let’s look more closely at the structure of the first few elements of
the hub_validations
object retuned by
v <- validate_submission(hub_path,
file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> List of 6
#> $ valid_config :List of 4
#> ..$ message : chr "All hub config files are valid. \n "
#> ..$ where : chr "simple"
#> ..$ call : chr "check_config_hub_valid"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_exists :List of 4
#> ..$ message : chr "File exists at path \033[34mmodel-output/team1-goodmodel/2022-10-08-team1-goodmodel.csv\033[39m. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_file_exists"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_name :List of 4
#> ..$ message : chr "File name \033[34m\"2022-10-08-team1-goodmodel.csv\"\033[39m is valid. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_file_name"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_location :List of 4
#> ..$ message : chr "File directory name matches `model_id`\n metadata in file name. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_file_location"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ round_id_valid:List of 4
#> ..$ message : chr "`round_id` is valid. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_valid_round_id"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_format :List of 4
#> ..$ message : chr "File is accepted hub format. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_file_format"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
Each <hub_check>
objects contains the following
: the result message containing details about the check. -
: there the check was performed, usually the model output file name. -
: the function used to perform the check. -
: whether the message is formatted using cli format, almost always TRUE.
Extra information
Some <hub_check>
objects contain extra information
about the failing check to help identify affected rows in
For example, the <hub_check>
object returned for
the valid_vals
check, which checks that all columns in a
model output file (excluding the value
column) contain
valid combinations of task ID / output type / output type ID values
contains an additional element called error_tbl
, with
details of the invalid value combinations in the rows affected.
To access error_tbl
from the output of
stored in an object v
you would use: