Structure of hub_validations class objects
Source:vignettes/articles/hub-validations-class.Rmd
hub-validations-class.Rmd
The high level validate_*()
family of functions all
return a <hub_validations>
S3 class object.
Structure of <hub_validations>
object
A hub_validations
object is effectively a list and
represents the collected output of the series of checks performed by a
higher level validate_*()
function.
Each named element of the list contains the result of an individual
check and inherits from subclass <hub_check>
. The
name of each element is the name of the check.
Let’s examine an example output of a model output file validation
using validate_submission()
.
hub_path <- system.file("testhubs/simple", package = "hubValidations")
v <- validate_submission(hub_path,
file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
)
str(v, max.level = 1)
#> Classes 'hub_validations', 'list' hidden list of 20
#> $ valid_config :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_exists :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_name :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_location :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ round_id_valid :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_format :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ metadata_exists :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_read :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ valid_round_id_col:List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ unique_round_id :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ match_round_id :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ colnames :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ col_types :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ valid_vals :List of 5
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ rows_unique :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ req_vals :List of 5
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ value_col_valid :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ value_col_non_desc:List of 5
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ value_col_sum1 :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_info" "hub_check" "rlang_message" "message" ...
#> $ submission_time :List of 6
#> ..- attr(*, "class")= chr [1:5] "check_failure" "hub_check" "rlang_error" "error" ...
The super class returned in each element depends on the status of the check:
If a check succeeds, a
<message/check_success>
condition class object is returned.If a check is skipped, a
<message/check_info>
condition class object is returned.-
Checks vary with respect to whether they return an
<error/check_failure>
or<error/check_error>
condition class object if the check fails.-
<error/check_failure>
class objects indicate a check that failed but does not affect downstream checks so validation was able to proceed. -
<error/check_error>
class objects indicate early termination of the validation process because of failure of a check downstream checks depend on.
-
Ultimately, both will cause overall validation to fail. The
<error/check_error>
class exists to alert you to the
fact that there may be more errors not yet reported due to
early termination of the check process.
hub_validations
print method
hub_validations
objects have their own print method
which displays the result, the check name and message of each check:
-
✔
indicates a check was successful (a<message/check_success>
condition class object was returned) -
✖
indicates a check failed but, because it does not affect downstream checks, validation was able to proceed (a<error/check_failure>
condition class object was returned) -
ⓧ
indicates a check that downstream checks depend on failed, causing early return of the validation process (a<error/check_error>
condition class object was returned) -
☒
indicates an execution error occured and the check was not able to complete (a<error/check_exec_error>
condition class object was returned). Will cause early return if expected check failure output was a<error/check_error>
. -
ℹ
indicates a check was skipped (a<message/check_info>
condition class object was returned)
v
#>
#> ── simple ────
#>
#> ✔ [valid_config]: All hub config files are valid.
#>
#>
#> ── 2022-10-08-team1-goodmodel.csv ────
#>
#>
#>
#> ✔ [file_exists]: File exists at path
#> model-output/team1-goodmodel/2022-10-08-team1-goodmodel.csv.
#> ✔ [file_name]: File name "2022-10-08-team1-goodmodel.csv" is valid.
#> ✔ [file_location]: File directory name matches `model_id` metadata in file
#> name.
#> ✔ [round_id_valid]: `round_id` is valid.
#> ✔ [file_format]: File is accepted hub format.
#> ✔ [metadata_exists]: Metadata file exists at path
#> model-metadata/team1-goodmodel.yaml.
#> ✔ [file_read]: File could be read successfully.
#> ✔ [valid_round_id_col]: `round_id_col` name is valid.
#> ✔ [unique_round_id]: `round_id` column "origin_date" contains a single, unique
#> round ID value.
#> ✔ [match_round_id]: All `round_id_col` "origin_date" values match submission
#> `round_id` from file name.
#> ✔ [colnames]: Column names are consistent with expected round task IDs and std
#> column names.
#> ✔ [col_types]: Column data types match hub schema.
#> ✔ [valid_vals]: `tbl` contains valid values/value combinations.
#> ✔ [rows_unique]: All combinations of task ID
#> column/`output_type`/`output_type_id` values are unique.
#> ✔ [req_vals]: Required task ID/output type/output type ID combinations all
#> present.
#> ✔ [value_col_valid]: Values in column `value` all valid with respect to
#> modeling task config.
#> ✔ [value_col_non_desc]: Values in `value` column are non-decreasing as
#> output_type_ids increase for all unique task ID value/output type
#> combinations of quantile or cdf output types.
#> ℹ [value_col_sum1]: No pmf output types to check for sum of 1. Check skipped.
#> ✖ [submission_time]: Submission time must be within accepted submission window
#> for round. Current time "2024-09-05 08:00:53 UTC" is outside window
#> 2022-10-02 EDT--2022-10-09 23:59:59 EDT.
Note that the submission window check is always performed and reported last.
Structure of a <hub_check>
object
Let’s look more closely at the structure of the first few elements of
the hub_validations
object retuned by
validate_submission()
v <- validate_submission(hub_path,
file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
)
str(utils::head(v))
#> List of 6
#> $ valid_config :List of 4
#> ..$ message : chr "All hub config files are valid. \n "
#> ..$ where : chr "simple"
#> ..$ call : chr "check_config_hub_valid"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_exists :List of 4
#> ..$ message : chr "File exists at path \033[34mmodel-output/team1-goodmodel/2022-10-08-team1-goodmodel.csv\033[39m. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_file_exists"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_name :List of 4
#> ..$ message : chr "File name \033[34m\"2022-10-08-team1-goodmodel.csv\"\033[39m is valid. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_file_name"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_location :List of 4
#> ..$ message : chr "File directory name matches `model_id`\n metadata in file name. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_file_location"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ round_id_valid:List of 4
#> ..$ message : chr "`round_id` is valid. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_valid_round_id"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_format :List of 4
#> ..$ message : chr "File is accepted hub format. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_file_format"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
Each <hub_check>
objects contains the following
elements:
-
message
: the result message containing details about the check. -
where:
: there the check was performed, usually the model output file name. -
call
: the function used to perform the check. -
use_cli_format
: whether the message is formatted using cli format, almost always TRUE.
Extra information
Some <hub_check>
objects contain extra information
about the failing check to help identify affected rows in
submissions.
For example, the <hub_check>
object returned for
the valid_vals
check, which checks that all columns in a
model output file (excluding the value
column) contain
valid combinations of task ID / output type / output type ID values
contains an additional element called error_tbl
, with
details of the invalid value combinations in the rows affected.
To access error_tbl
from the output of
validate_submission()
stored in an object v
,
you would use:
v$valid_vals$error_tbl