Performing runtime type-checking in Python
or how to fail fast in Python
Imagine the following two scenarios.
Scenario 1: You deployed a machine learning RESTful service that provides disease diagnoses. Unfortunately, you might have certain conditions where you know your model should not be invoked because of certain regulatory constraints (e.g., some countries might not allow automated patient diagnosis unless some pre-requisites are met). Rather than allowing your code to execute, you want it to fail fast with an informative message like “Can’t diagnose the patient because of reason X”. You want these validation errors to be clear, consolidated, and not hidden in try-except clauses within your codebase.
Scenario 2: You’re developing packages for Machine Learning or Scientific Computing. You want your packages to have wide adoption and your audience is both ML professionals and users who are just starting in the field. Unfortunately, not everyone in your audience knows how to use your package correctly and you often find that users are setting parameters that would lead to incorrect model behavior (e.g., specifying weights as an array of zeros when training an Xgboost model or specifying non-differentiable custom loss functions).
In both of these scenarios, you would like your application or package to have the following benefits:
- You want your system to fail fast and in a way that guides the user towards implementing the correct functionality. In other words, you want some form of type-checking performed at runtime.
- You want your system to be easy to maintain, modify, or extend. Specifically, you want to have systems that are auto-documenting, that are generally DRY and have the benefits of type-checking.
- You want the solution to be reusable across scenarios. In other words, you would rather learn one tool that does a good job across a variety of scenarios (e.g., building general use packages and also deploying services) rather than deal with learning multiple abstraction patterns and packages for specific use-cases.
We will review the following approaches and their benefits and tradeoffs for how you could accomplish the above:
- Using built-in types such as
dataclass
with a static type checker - JSON Schema Validation
pydantic
Python built-in type system
One common approach in letting a user know that certain behavior is not supported is to use types in your classes and function declarations. For example, below I declare two Enums that encode the procedure type and the outcome of the model diagnosis. I also declare two dataclass
instances that specify the input and output classes that will be received by a downstream function. Note that if you run the below with a static type checker like mypy, no issues will be highlighted. In other words, as long as the user correctly declares the UserAssessment
dataclass, the function apply_assessment
will work correctly.
There are several benefits to structuring the work in this way:
- Declaring types provide self-documenting code. This can be extremely beneficial if you're dealing with large codebases.
- You can run a static type checker to find bugs and broken dependencies in your code.
- This workflow works great with testing tools, such as pytest and hypothesis. For example, if I wanted to test the behavior of the function
apply_treatment
all I have to do is generate examples from types with Hypothesis:
Nonetheless, there are also some several large drawbacks:
- Python types do not provide runtime type-checking. If I run the code below, it will fail even though I specified for age to be an int and for the procedure to be an Enum.
# this fails to fail
UserAssessment('junk', 'junk', 'junk', 'junk')Output: UserAssessment(procedure='junk', age='junk', occupation='junk', doctor_approved='junk')
Although a static type-checker might protect you from mistakes you make in your own code, due to lack of runtime failure, you will end up having a lot of boilerplate isinstance
and try-except
code patterns for any client-facing functions that could generate bad inputs.
- This approach does not generalize well across package development and service development use-cases. Ideally, if you’re developing and deploying services, you would want to use the OpenAPI specification standards for documenting APIs. Unfortunately, there are no tools that I’m aware of that can convert custom python classes and types into JSON Schema.
JSON Schema Validation
If you’re developing services, you probably know what JSON Schema does. Specifically, it has several advantages:
- It allows you to clearly define your API contract by specifying exactly what inputs are accepted and in what format.
- Integrates with OpenAPI.
- Universally accepted standard when it comes to developing contracts for RESTful services.
- Language agnostic.
Unfortunately, it also has these issues:
- Language agnostic: if your particular language does not have a package that resolves the interoperability between classes you define in your language of choice and the output of the JSON Schema, you might be stuck in a position where you have to duplicate validation both when declaring your case classes and when writing the JSON Schema.
- It does not extend to package development.
Pydantic
The Python language has seen considerable improvements to its type system. The need for stronger typing and tools that perform static type checking became more important as projects grew in size. Pydantic takes this one step further and enforces type checking at runtime. Here is how you could take the code above and make it compatible with Pydantic.
This implementation ensures that all the type-checking for a class is done at the class declaration. Notice also that Pydantic comes with a considerably richer set of types, such as Constraint Types (see here for more examples). What are the advantages of using this approach?
- It provides fast runtime type checking out of the box. If I run this code, it will let me know that the age is out of an acceptable range. In other words, you can immediately provide this feedback to the user to let them know that their inputs are going to lead to a downstream failure.
UserAssessment(procedure='cancer', age=200, occupation='junk', doctor_approved=True)Output:
1 validation error for UserAssessment
age
ensure this value is less than 100 (type=value_error.number.not_lt; limit_value=100)
- It works well with mypy, although there are still some cases where you might have to rely on an occasional
#type ignore
- It consolidates data validation within the class declaration. This is important because it allows you to separate the logic performed by your application from the logic required to validate the data. You will generally have less need to use
try-except
orisinstance
. - It works well with the testing tools such as
hypothesis
, if you’re using the dataclass API. - It comes with very rich documentation, which indicates that the creator of the package is very empathetic about making sure that users have a good experience when interacting with pydantic https://pydantic-docs.helpmanual.io/.
- It provides the conversion of python classes to JSON Schema. All you have to do is invoke
schema_json
method on your class, and you can get the JSON schema specification for free; Here is an example of how it looks below for ourUserAssessment
class;
print(UserAssessment.schema_json(indent=2))
{
"title": "UserAssessment",
"type": "object",
"properties": {
"procedure": {
"title": "Procedure",
"enum": [
"cancer",
"flu"
]
},
"age": {
"title": "Age",
"exclusiveMinimum": 0,
"exclusiveMaximum": 100,
"type": "integer"
},
"occupation": {
"title": "Occupation",
"type": "string"
},
"doctor_approved": {
"title": "Doctor Approved",
"type": "boolean"
}
},
"required": [
"procedure",
"age",
"occupation",
"doctor_approved"
]
}
This conversion means that you can use Pydantic for both API documentation and package development. If you want to see an example of how useful this is, please check out FastAPI.
- Encourages code reusability. Because Pydantic case classes can be used to generate automatic documentation, property-based tests, and even map them on ORM objects, the library encourages you to be very minimalistic when writing your code and allows you to focus much more on doing work rather than writing a set of custom utilities to handle additional validation logic.
Conclusion
Although Python language has had major changes introduced by the typing
module, the types provided by the language are not enforced at runtime. If you’re building APIs or you find yourself developing packages where runtime validation is important, I highly encourage you to try Pydantic.