Performing runtime type-checking in Python

Photo by Julian Dutton on Unsplash

Imagine the following two scenarios.

Scenario 1: You deployed a machine learning RESTful service that provides disease diagnoses. Unfortunately, you might have certain conditions where you know your model should not be invoked because of certain regulatory constraints (e.g., some countries might not allow automated patient diagnosis unless some pre-requisites are met). Rather than allowing your code to execute, you want it to fail fast with an informative message like “Can’t diagnose the patient because of reason X”. You want these validation errors to be clear, consolidated, and not hidden in try-except clauses within your codebase.

Scenario 2: You’re developing packages for Machine Learning or Scientific Computing. You want your packages to have wide adoption and your audience is both ML professionals and users who are just starting in the field. Unfortunately, not everyone in your audience knows how to use your package correctly and you often find that users are setting parameters that would lead to incorrect model behavior (e.g., specifying weights as an array of zeros when training an Xgboost model or specifying non-differentiable custom loss functions).

In both of these scenarios, you would like your application or package to have the following benefits:

  • You want your system to fail fast and in a way that guides the user towards implementing the correct functionality. In other words, you want some form of type-checking performed at runtime.
  • You want your system to be easy to maintain, modify, or extend. Specifically, you want to have systems that are auto-documenting, that are generally DRY and have the benefits of type-checking.
  • You want the solution to be reusable across scenarios. In other words, you would rather learn one tool that does a good job across a variety of scenarios (e.g., building general use packages and also deploying services) rather than deal with learning multiple abstraction patterns and packages for specific use-cases.

We will review the following approaches and their benefits and tradeoffs for how you could accomplish the above:

  • Using built-in types such as dataclass with a static type checker
  • JSON Schema Validation
  • pydantic

Python built-in type system

One common approach in letting a user know that certain behavior is not supported is to use types in your classes and function declarations. For example, below I declare two Enums that encode the procedure type and the outcome of the model diagnosis. I also declare two dataclass instances that specify the input and output classes that will be received by a downstream function. Note that if you run the below with a static type checker like mypy, no issues will be highlighted. In other words, as long as the user correctly declares the UserAssessment dataclass, the function apply_assessment will work correctly.

There are several benefits to structuring the work in this way:

  • Declaring types provide self-documenting code. This can be extremely beneficial if you're dealing with large codebases.
  • You can run a static type checker to find bugs and broken dependencies in your code.
  • This workflow works great with testing tools, such as pytest and hypothesis. For example, if I wanted to test the behavior of the function apply_treatment all I have to do is generate examples from types with Hypothesis:

Nonetheless, there are also some several large drawbacks:

  • Python types do not provide runtime type-checking. If I run the code below, it will fail even though I specified for age to be an int and for the procedure to be an Enum.
# this fails to fail
UserAssessment('junk', 'junk', 'junk', 'junk')
Output: UserAssessment(procedure='junk', age='junk', occupation='junk', doctor_approved='junk')

Although a static type-checker might protect you from mistakes you make in your own code, due to lack of runtime failure, you will end up having a lot of boilerplate isinstance and try-except code patterns for any client-facing functions that could generate bad inputs.

  • This approach does not generalize well across package development and service development use-cases. Ideally, if you’re developing and deploying services, you would want to use the OpenAPI specification standards for documenting APIs. Unfortunately, there are no tools that I’m aware of that can convert custom python classes and types into JSON Schema.

JSON Schema Validation

If you’re developing services, you probably know what JSON Schema does. Specifically, it has several advantages:

  • It allows you to clearly define your API contract by specifying exactly what inputs are accepted and in what format.
  • Integrates with OpenAPI.
  • Universally accepted standard when it comes to developing contracts for RESTful services.
  • Language agnostic.

Unfortunately, it also has these issues:

  • Language agnostic: if your particular language does not have a package that resolves the interoperability between classes you define in your language of choice and the output of the JSON Schema, you might be stuck in a position where you have to duplicate validation both when declaring your case classes and when writing the JSON Schema.
  • It does not extend to package development.

Pydantic

The Python language has seen considerable improvements to its type system. The need for stronger typing and tools that perform static type checking became more important as projects grew in size. Pydantic takes this one step further and enforces type checking at runtime. Here is how you could take the code above and make it compatible with Pydantic.

This implementation ensures that all the type-checking for a class is done at the class declaration. Notice also that Pydantic comes with a considerably richer set of types, such as Constraint Types (see here for more examples). What are the advantages of using this approach?

  • It provides fast runtime type checking out of the box. If I run this code, it will let me know that the age is out of an acceptable range. In other words, you can immediately provide this feedback to the user to let them know that their inputs are going to lead to a downstream failure.
UserAssessment(procedure='cancer', age=200, occupation='junk', doctor_approved=True)Output:
1 validation error for UserAssessment
age
ensure this value is less than 100 (type=value_error.number.not_lt; limit_value=100)
  • It works well with mypy, although there are still some cases where you might have to rely on an occasional #type ignore
  • It consolidates data validation within the class declaration. This is important because it allows you to separate the logic performed by your application from the logic required to validate the data. You will generally have less need to use try-except or isinstance.
  • It works well with the testing tools such as hypothesis, if you’re using the dataclass API.
  • It comes with very rich documentation, which indicates that the creator of the package is very empathetic about making sure that users have a good experience when interacting with pydantic https://pydantic-docs.helpmanual.io/.
  • It provides the conversion of python classes to JSON Schema. All you have to do is invoke schema_json method on your class, and you can get the JSON schema specification for free; Here is an example of how it looks below for our UserAssessment class;
print(UserAssessment.schema_json(indent=2))
{
"title": "UserAssessment",
"type": "object",
"properties": {
"procedure": {
"title": "Procedure",
"enum": [
"cancer",
"flu"
]
},
"age": {
"title": "Age",
"exclusiveMinimum": 0,
"exclusiveMaximum": 100,
"type": "integer"
},
"occupation": {
"title": "Occupation",
"type": "string"
},
"doctor_approved": {
"title": "Doctor Approved",
"type": "boolean"
}
},
"required": [
"procedure",
"age",
"occupation",
"doctor_approved"
]
}

This conversion means that you can use Pydantic for both API documentation and package development. If you want to see an example of how useful this is, please check out FastAPI.

  • Encourages code reusability. Because Pydantic case classes can be used to generate automatic documentation, property-based tests, and even map them on ORM objects, the library encourages you to be very minimalistic when writing your code and allows you to focus much more on doing work rather than writing a set of custom utilities to handle additional validation logic.

Conclusion

Although Python language has had major changes introduced by the typingmodule, the types provided by the language are not enforced at runtime. If you’re building APIs or you find yourself developing packages where runtime validation is important, I highly encourage you to try Pydantic.

--

--

--

Data Scientist

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Continuous delivery in IBM Cloud — deploying a microservice to Kubernetes in IBM Cloud

Just another “first year in the life” of a new Dev…

Booking System admitting Overbooking

Full-Stack Imposter Syngineer 🤷

5 Lessons on How Agile can Solve SME’s Engagement Issues

CS371p Spring 2022: Lilia Li Blog #11

The perfect layout for the perfect Asana project

How to Integrate Salesforce with Application on Heroku to Interface with External Systems

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Stas Sajin

Stas Sajin

Data Scientist

More from Medium

Custom metric on Datadog with Python

Importance of Python in Data Engineering

Machine learning model partitioning for multi-tenant data in Python

Working with XML data using Python