The Expression Problem

Generalised

What is The Expression Problem?

  • The Expression Problem was first described by Philip Wadler in 1998
  • Summary
    1. Define a data type by cases
    2. Add new cases to the data type and/or new functions over the data type
    3. Without changing existing code

What is The Expression Problem?

“Whether a language can solve the Expression Problem is a salient indicator of its capacity for expression.”

– Philip Wadler

What do we mean by this definition?

Define a data type by cases

enum DataType {
  IsInt(Int),
  IsString(String),
  JustIs
}

What do we mean by this definition?

Write function(s) over that data type

int getInt(DataType d) {
  switch(d) {
    case IsInt(i):
      return 1 + i;
    case IsString(s):
      return 2 + s.length();
    case JustIs:
      return 3;
  }
}

What do we mean by this definition?

Now add a case to the data type

enum DataType {
  IsInt(Int),
  IsString(String),
  JustIs,
  // new case
  IsBool(Bool)
}

What do we mean by this definition?

Where to from here?

println(getInt(IsBool(true)));

Int getInt(DataType d) {
  ...
}
First line defences
  • Throw an exception at runtime if IsBool is passed to getInt
  • Fail to compile the getInt function until it is changed to handle Bool
  • Write a new DataTypeWithBool, in addition to the previous data type

println(getInt(IsBool(true)));
Throw an exception at runtime …
“We, the programmers, do solemnly swear, to never call the getInt function with an IsBool value”

Oh-puhlease

Fail to compile the getInt function until it is changed to handle IsBool
  • In the absence of any other approach, this is the most robust solution
  • However:
    • We must modify the existing code (getInt)
    • It can be time-consuming for the non-trivial case
Write a new DataTypeWithBool, in addition to the previous data type

enum DataTypeWithBool {
  Existing(DataType),
  IsBool(Bool)
}
  • This approach seems to work
  • However, implementing functions over the data type layers quickly becomes clumsy, with varying nesting depths
Alternative efforts are also made

Apologies

but we don’t need to change that code very often anyway
Alternative efforts are also made

Just outright, blatant irresponsibility

let’s use a programming language without types

We have encountered a problem

The Expression Problem

What Does a Solution Look Like?

  • A solution to The Expression Problem defines:
    • a minimal interface for which a data type can express its components
    • the ability to write all functions over that data type in terms of that interface
  • Importantly, we are finding an optimal trade-off for abstracting:
    • the minimum definition of an interface. Small, but not too small.
    • the maximum amount of derived functions from that interface.

What Does a Solution Look Like?

Given our previous example:


enum DataType {
  IsInt(Int),
  IsString(String),
  JustIs
}

We want an interface over which we can write these example functions (and more):

Bool isJustIs(DataType)
// returns the IsInt value
Maybe<Int> getInt(DataType)
// + 1 to the IsInt value if it is an int
DataType addOne(DataType)
// reverse the string if it is a string
DataType reverseString(DataType)

What Does a Solution Look Like?

  • Our solution will not express itself in terms of the DataType, but rather, each component of the data type
  • We also want to keep our interface as minimal as possible, so that our future data types are able to implement it
  • What would such an interface look like?

What Does a Solution Look Like?

  • Several solutions to The Expression Problem have been proposed
    • Using golang interface to express a minimal interface
    • Ruby open classes
    • Using Clojure defprotocol to express a minimal interface
    • Tagless final / Object algebras 1
    • Polymorphic variants 2

What Does a Solution Look Like?

I have also seen the solution, “turn all the types off, open the data type, and hold its structure in your head…

I'm looking at you clojure programmers

but why are there so many bugs?”

Generalising The Expression Problem

  • In our previous example, a DataType is either:
    • an Int OR
    • a String OR
    • a unital or single value

enum DataType {
  IsInt(Int),
  IsString(String),
  JustIs
}

Generalising The Expression Problem

  • We observe The Expression Problem in terms of adding more cases to the data type e.g. a Bool
    • We would like our functions to continue as normal, without recompilation
    • Maybe<Int> getInt(DataType)
    • DataType addOne(DataType)
    • DataType reverseString(DataType)

enum DataType {
  IsInt(Int),
  IsString(String),
  JustIs,
  IsBool(Bool)
}

Generalising The Expression Problem

Or perhaps a more realistic data type made up of cases


enum Json {
  IsNumber(Number),
  IsBool(Bool),
  IsString(String),
  IsArray([Json]),
  IsObject([String, Object]),
  IsNull
}

Generalising The Expression Problem

  • But what if we instead have a data type with:
    • an Int AND
    • a String

class Person {
  Int,
  String
}

Generalising The Expression Problem

  • We can express a similar problem, not in terms of cases, but in terms of fields
  • We want to add a field to the Person data type, without changing our functions

class Person {
  Int,
  String,
  Bool
}

Generalising The Expression Problem

We might have a similar set of functions over our data type

Int getInt(Person)
Person addOne(Person)
Person reverseString(Person)

Generalising The Expression Problem

We are still looking for that interface, that abstraction


interface HasInt<T> {
  Int getInt(T t)
}

interface HasString<T> {
  String getString(T t)
}

Generalising The Expression Problem

No. We need more than HasString to write this function.


<T> T reverseString(T t)

Generalising The Expression Problem

  • We can take this generalisation even further
  • What is the minimal interface that allows us to add 1 to both Int fields of our data type?

class Person2 {
  Int,
  String,
  Int
}

Generalising The Expression Problem

  • Further again
  • What is the minimal interface that allows us to add 1 to the Int fields of the Person case for our data type?

enum DataType2 {
  IsInt(Int),
  IsString(String),
  IsPerson(Person2)
}

Generalising The Expression Problem

Stepping back
  • We have some relationships between data types
    • DataType is either Int OR something-else
    • Person is Int AND something-else
    • DataType2 is zero-or-many Int
    • Person2 is one-or-many Int
  • Some of these relationships are strictly weaker than others. For example:
    • the relationship between DataType2 and Int is strictly weaker than that between DataType and Int
    • the relationship between Person2 and Int is strictly weaker than that between Person and Int
  • Our relationships are not independent of each other, but share subtype/supertype commonality

Generalising The Expression Problem

Data Type Relationships

Data Type Relationships

Generalising The Expression Problem

Data Type Relationships
  • We would like an interface to express each of these types of relationship
  • Ideally:
    1. meeting the requirements to express that relationship is minimised
    2. concrete operations over that interface is maximised
  • This way, we’d be solving The Expression Problem, in general

Generalising The Expression Problem

What are the names for these data type relationships?

Data Type Relationships Question

Generalising The Expression Problem

Optics

Data Type Relationships Optics

Generalising The Expression Problem

  • Optics solve The Expression Problem by:
    • Defining a minimal interface to express the relationships between data
    • Maximise the available derived operations that come about from that interface
  • Further, optics solves The Expression Problem in general.

Generalising The Expression Problem

Optic Relationship
Lens A has exactly one B and some other things
Prism A has exactly one B or some other things
Traversal A has one or many B and/or some other things

  1. Oliveira, Bruno C. D. S., and William R. Cook. “Extensibility for the masses: Practical extensibility with object algebras.” European Conference on Object-Oriented Programming. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012.↩︎

  2. Garrigue, Jacques. “Programming with polymorphic variants.” ML workshop. Vol. 13. No. 7. 1998.↩︎