Any good service developer has spent a lot more time than they originally planned to spend on validating their input and output data. They probably started with validation-by-gauntlet, i.e. if nothing breaks then the data was valid, and if something breaks, tough. Then on to basic validations – this field should be an integer, this field should look like an email address, that sort of thing – and from there, they piled on custom and semi-custom validations until there was a cozy little robin’s nest of validation code.
Oh yeah, and that nest of code has to emit good error messages. And it needs to be maintained forever.
There is a better way. The JSON Schema format allows one to describe, using JSON structures, the form of different objects their app deals with. Support is included for required fields, data type validation, collections (arrays) of data, and references to other JSON Schemas that may or may not originate from the same document. It’s a very simple format, but quite powerful for describing and enforcing constraints on data formats.
ExJsonSchema is an
Elixir library which handles the loading of JSON Schemas and the
verification of data against them. In particular, it offers the
ExJsonSchema.Validator.validate function, which takes as input a JSON
Schema and a piece of data, and returns a list of validation errors and
where they occurred. With just a little bit of effort, it’s easy to
wrap this functionality so that it can be used easily within your app.
Make a Schema
The first step is to create a JSON Schema document. We called ours
schema.json and put it in the root directory of our app. The full
JSON Schema specification is outside the scope of this post, so instead,
here’s an example schema for a theoretical events API. This API takes
event_collection objects, which contain an array of
A couple of things about this:
We’ve declared a nested data structure
event-collection, which references another data type
eventfrom the same document using
$refand a JSON Pointer string.
timestampfields can be in either integer (like a Unix timestamp) or string (like ISO8601 date-times) format. If provided in string format, the JSON Schema will validate it as a date-time.
Rather than creating separate JSON Schema files for each of the data types, we place our data type schemata together under the
Test the Schema (Find Problems)
Now that we have a JSON Schema to work from, we need to load it into
ExJsonSchema. This involves reading the file from disk, JSON decoding
it, and passing it to
resolve call is potentially expensive, as it may reach out to
external network resources to do its job. We’ll want to make sure we
only call it once.
We’re ready to start validating some data. Since our data type schemata
are under the
definitions key, we need to point to the schema we want
when we call
validate. We can do it like this:
schema.schema["definitions"]["your-data-type-here"] thing is
going to get old fast.
Anyway, it works on single data structures; let’s check out nested data.
Excellent, it’s not only giving us good error messages, it’s giving us a
path to where the errors occurred: at indices 0 and 1 of the
array. We can use that to build some handsome validation error
Alas, there are a few gotchas.
The first is that the schema validation only works on maps with string keys.
This isn’t too surprising; allocating symbols at runtime is frowned upon because they don’t get garbage-collected, and JSON only supports string keys anyhow.
The second is more of a bug than a gotcha. ExJsonSchema falls over when validating seriously malformed nested objects:
Though it’s inconvenient, it does provide an indication of malformed data, so at least it can be made useful.
At this point we’ve identified several things we’ll want to abstract away:
ExJsonSchema.Schema.resolveshould only be called once;
- Typing out
schema.schema["definition"]is brutish and should be avoided;
- Input data should be transformed so that all map keys become strings;
- Putting non-objects where nested objects should go causes a
BadMapErrorto be raised.
- The output from
ExJsonSchema.Validator.validateis not suitable for direct inclusion in a JSON document, because it contains tuples. The validation errors need to be massaged before sending them to the end user.
Wrap ExJsonSchema (Create Solutions)
The first criterion above, that
resolve should only be called once,
implies that we need persistent state. One idiomatic approach to this
in Elixir is implementing
a generic interface which models any client/server interaction.
(This does not imply communicating across a network; it’s just an
abstraction around managing access to state.)
We start by adding our GenServer module to the application supervision
This takes place in
lib/myapp.ex, like so:
We then must create our GenServer module,
handle_call to fulfill just enough
of the GenServer interface. In Erlang parlance,
call is a synchronous request, and a
cast is async. We’re only
interested in synchronous function calls – after all, what good is a
validator that doesn’t give you answers – so we won’t implement
Those two functions are enough to allow interaction using the GenServer
module directly, but the syntax is awkward enough that it’s worth
pouring some sugar on. Let’s add a
that makes the GenServer call for us.
Pay attention to that default value of
:json_schema; it will come up
Next, let’s implement
get_validation_errors/3, which is invoked from
handle_call. We can take care of smoothing over
schema.schema["definition"] and catching exceptions here.
So where are we now? We can keep state, we can validate objects against individual schemata, and we can catch exceptions thrown by ExJsonSchema. We still need to convert map keys to strings, and transform error messages into a JSON-compatible data structure.
(Yeah, I know that
@doc is ignored for private methods. It’s still
the best way to document them.)
And to get the validation errors into JSON-compatible format, one easy way is to just collect the error messages themselves:
That’s just about all there is to build. Just about.
Make It Production-ready
One more thing remains before this code is ready for prime time. We need to make sure that it works when packaged as a release.
We need to ensure that
schema.json is included in the release. The easiest
way to do this is to move it to a directory which is already getting included
in the release, such as
priv on Phoenix applications. Let’s use
priv for the example.
To reference a file within the app’s source code directory, use the
Application.app_dir/1 function with the name of your app:
You may of course put “/priv/schema.json” into a config parameter if you like.
Now we have a robust JSON Schema validation system, ready for usage in our app and tests.
Let’s use it!
Validating Input Data
We can use our JSON Schema to ensure that data given as input to our app (e.g., JSON data from a POST request) is well-formed. Here’s a tiny example from a Phoenix controller:
Using the JSON Schema, we reduce the burden of writing validation code ourselves and scattering it around the codebase. If we’ve written the JSON Schema correctly and the input data passes validation, we don’t need validation in our downstream functions.
Validating Output Data
Validating output data all the time may be impractical; for instance, validating every single response served by an API may add too much response latency. But we can validate against the JSON Schema in our tests quite easily. Using a Phoenix controller test as an example:
We were faced with a problem: validating data and generating good validation error messages is undoubtedly a best practice, but it’s a time sink and can become a real PITA when nested objects are involved.
We solved this problem using the best tools available on the open market: the JSON Schema internet draft standard, and an open-source Elixir library to use JSON Schemas. We improved the usability of this JSON Schema library so that we could use it liberally in our code. And finally, we used this service we created to improve the correctness of our app on both the input and output sides.
This post uses Elixir, but the technique of using JSON Schema to validate input and output data is widely applicable, and pretty convenient to boot. I’ll be using this trick again and again.
Code for this post is available on GitHub.