Recently, I have been brushing up my knowledge of Formal Languages
and stumbled again into the many different faces of finite automata.
Most materials present DFAs, NFAs and company as distinct beasts, each
with their own properties and theorems. Nevertheless, I couldn’t get out
of my mind that all definitions seemed *too similar*. After
hitting my head into a couple walls, I finally noticed a link! Each kind
of finite automaton has a dynamics that runs in a certain context, or
equivalently, that is able to apply certain effects while it
executes.

The coolest part is that the way you write these contexts is exactly the same as you would do in a Haskell program: Monads. Even more, you can model things such as nondeterminism or partiality using the exact same monads as in real life code.

Depending on how used you are to these things, this last statement can be either obvious or shocking. Well, I can’t say it for you. But for me, it was a rather interesting finding.

```
{-# LANGUAGE RecordWildCards #-}
import Data.Foldable (foldlM)
import Control.Monad.Identity (Identity(..))
import Data.Bifunctor (second)
import Data.Complex (Complex, magnitude)
import Data.Map qualified as Map
```

## One Definition to Rule Them All

Since this post is about abstraction, let’s start with a general definition. We will closely follow the standard definition for a finite automaton with one twist: the transition function is parameterized for a certain context.

```
-- | Finite automaton with state `s`, alphabet `a` and a monadic context `m`.
-- The type parameters `s` and `a` are assumed to represent finite sets.
data Automaton m s a = Automaton
initial :: s -- ^ Initial State
{ transition :: s -> a -> m s -- ^ Change state with a context.
, accepting :: s -> Bool -- ^ Accepting subset as a predicate.
, }
```

The type above is a simple model of computation written as a
controllable dynamical system. Given a string of characters in the
alphabet, it starts in the `initial`

state and, for each
character, follows the appropriate `transition`

into the next
state. The context `m`

is a monad that represents what
effects our automaton is able to perform. This way we can run the
automaton by consuming the input characters until we arrive in the final
state (in context).

```
run :: Monad m => Automaton m s a -> [a] -> m s
Automaton{..} = foldlM transition initial run
```

The function `foldlM`

is just Haskell’s way of repeatedly
iterating a transition from the initial state while executing the
monadic effects. Notice that in the above we are using
`RecordWildCards`

to put the `Automaton`

’s fields
in scope.

By switching the Monad `m`

, we recover different families
of automata. For example, **Deterministic Finite Automata**
use the `Identity`

Monad because their transition is an
ordinary function, while for **Nondeterministic Finite
Automata**, we can use the Haskell idiom of modeling power sets
via lists. This way, the transition has type
`s -> a -> [s]`

and represents a relation between
states.

Automata Theory is all about languages. Thus, given an automaton, one
is generally interested in using it to match or recognize input strings.
To do that, we will need our remaining field `accepting`

. If
after running the automaton, it ends in an accepting state, we say that
it recognizes the input.

```
recognize :: (Finite s, Monad m, Context m) => Automaton m s a -> [a] -> Bool
@Automaton{..} = possible accepting . run aut recognize aut
```

I think it is pretty cool, given all it does, how short this function
turns out to be! Another point of interest is that with our view of
subsets as predicates, what `recognize`

does is to convert an
automaton into the language (subset of all strings) it recognizes.

In order to complete the above we must define the
`possible`

function. By inspecting its type,

`possible :: (s -> Bool) -> (m s -> Bool)`

We see that it lifts a predicate about a state into a predicate about a monadic state. In the case of all our examples, it will be equivalent to being possible that the final state is an accepting one. Hence its name.

Since the `possible`

function depends on the chosen Monad,
we model it using a typeclass.

```
class Context m where
possible :: Finite s => (s -> Bool) -> (m s -> Bool)
```

We also limit the function’s scope to finite state spaces. I could say that this is to conform to the definition, but, to be fair, it is for technical reasons in some of the examples. We shall only use one property of finite sets: they are always orderable and comparable for equality.

`class (Eq a, Ord a) => Finite a`

## Many Flavours of Automata

Now that we are armed with the code for recognizing words, it is time
to explore different automata. Let’s begin with our old friends
`DFA`

and `NFA`

and then proceed to their more
exotic cousins.

### Deterministic Finite Automata

These are the simplest automata of them all because, after running them, we have a single unambiguous state.

```
-- | Deterministic Finite Automaton.
type DFA = Automaton Identity
```

To test whether it is accepting, all we have to do is checking it.

```
instance Context Identity where
pred (Identity s) = pred s possible
```

### Nondeterministic Finite Automata

To model nondeterminism, we use automaton that arrive at a whole list of states. This way, the transition represents a relation instead of a simple function.

```
-- | Non-deterministic Finite Automaton.
type NFA = Automaton []
```

We consider them to accept a word if any of the final states is accepting.

```
instance Context [] where
pred = any pred possible
```

Although we aren’t using parallel programming in here, you can think of these automata as following many sequences of states in parallel and, in the end, it recognizes the word if any sequence arrived at an accepting state.

### Partial Transitions

An advantage of parameterizing the return context is that we are able to plug other Monads to get their effects on the automaton transition. For example, in a DFA any state is required to have a valid transition for any character in the alphabet. This can lead to some peculiar modeling where an automaton must have a lot of characters transitioning to some error state. Wouldn’t it be better to allow the state machine to fail whenever it encounters a character that is invalid to the current state?

Well, if you are in any way used to Haskell, you should know what Monad models a computation that maybe happens.

```
-- | Partial Deterministic Finite Automaton.
type PDFA = Automaton Maybe
```

For a word to be accepted, the run should succeed and end with an accepting state.

```
instance Context Maybe where
pred Nothing = False
possible pred (Just s) = pred s possible
```

As an example of an automaton that is easier to write with
partiality, consider the Christmas-themed state machine that recognizes
the language `ho(ho)*`

. Notice how we only need to describe
useful arrows.

### It’s all a matter of chance

Let’s get a bit less classical with Probabilistic Finite Automata. These are close cousins to Markov Decision processes who act similarly to non-deterministic automata, while also quantifying the system’s chance of taking a certain transition.

Since Haskell does not have a built-in probabilistic programming, we have to first define a Monad that represents probability distributions. In order to not lose ourselves into a tangent, we will write the simplest probability monad possible without much worry about performance or flexibility. But know that there are production grade probability programming libraries in Haskell, such as monad-bayes.

We represent a probability distribution as a list of outcomes paired to their respective probabilities.

```
data Dist p a = Dist [(a, p)]
deriving (Functor, Show)
Dist x) = x listProb (
```

With an eye on some further applications, we are also parameterizing it on the probability’s type. Nevertheless, for now we are interested on Real probabilities.

```
-- Probabilities over R
-- The coefficients are assumed to be in [0, 1] and to sum to 1.
type Prob = Dist Double
```

It has Applicative and Monad instances representing, respectively, the interaction of independent and dependent random variables.

```
instance Num p => Applicative (Dist p) where
pure x = Dist [(x, 1)]
Dist fs <*> Dist xs = Dist [(f x, p * q) | (f, p) <- fs, (x, q) <- xs]
instance Num p => Monad (Dist p) where
Dist xs >>= g = Dist [(y, p * q) | (x, p) <- xs, (y, q) <- listProb (g x)]
```

Finally, since we are assuming everything is finite, we can model events as predicates. The probability of an event happening is the sum of the probabilities for all outcomes for which the event holds.

```
prob :: (a -> Bool) -> Prob a -> Double
pred = sum . fmap snd . filter (pred . fst) . listProb prob
```

Great! Despite being a bit crude, the above should be enough for our stochastic ambitions in this post. Also remember that we are glossing over a lot of details in here. So it is completely reasonable to not understand everything above.

Our Probabilistic Finite Automaton is then an automaton whose contexts are probability distributions. That is, each transition is non-deterministic and uncertain: you don’t know for which state it transitions but know the chance for each one.

```
-- | Partial Deterministic Finite Automaton.
type PFA = Automaton Prob
```

Given a probability distribution on the final states, we consider that it is possible to be accepting if there is a non-zero probability to accept the input string.

```
instance Context Prob where
pred m = prob pred m > 0 possible
```

### Going Quantum

I remember reading some time ago a post by Dan Piponi about the similarities between probabilistic and quantum programming. Hence, after writing the previous section, I started thinking about if it was possible to also represent Quantum Finite Automata with minimal changes in the above formalism.

The idea on Dan Piponi’s post is to write a Quantum Monad as a Distribution Monad where the probabilities (or amplitudes) are complex numbers.

```
-- Quantum Amplitudes over C
-- The coefficients are assumed to lie in a unity circle, i.e., C |c_i|^2 = 1.
type Quantum = Dist (Complex Double)
```

Among the main characteristics of a Quantum system is wavefunction collapse. When we observe the system, it collapses into a classical one with the probability for each eigenstate given by the square of the amplitude’s magnitude.

```
observe :: Ord s => Quantum s -> Prob s
= Dist . realize . collect
observe where
= Map.toList . Map.fromListWith (+) . listProb
collect = (magnitude z) ^ 2
norm2 z = fmap (second norm2) realize
```

Alright, after this quantum speed run, I imagine that you already know what a quantum automata should look like.

```
-- Quantum Finite Automaton
type QFA = Automaton Quantum
```

Well, to be fair, there are some different descriptions of QFAs in the literature and I don’t quite know to which one the above corresponds. If there is anybody in the audience more versed in Quantum Computing than I am, please shout about anything interesting that may be happening here.

To wrap up this section, let’s define how a QFA recognizes a language. Similarly to its classical cousin, we say that it possibly accepts a distribution over the final states if there is a non-zero probability of acceptance after we observe the system.

```
instance Context Quantum where
pred m = prob pred (observe m) > 0 possible
```

## Parting Thoughts

Well, this post was more exploratory than usual, but I hope that you still had some fun. As I was learning many parts while I wrote it, there are still many questions to ask and answer! For example, what about other monads? What does an automaton with a Continuation monad represents? As everything that is related to continuations, I bet it should be interesting.

Also, while all examples in this post fall into what people tend to
consider “finite automata”, I have an itch that it should be possible to
model pushdown
automata using a `State [w]`

monad to represent the
stack. However, I was unable to think of a way to write it such that the
transitions depend only on the top of the stack instead of the whole
list. It would be rather cool to go up the Chomsky hierarchy this
way.

Finally, our `possible`

function is rather similar to
`any`

from the Prelude. In fact, they are the same for three
of our examples: `Identity`

, `[]`

, and
`Maybe`

. Perhaps it’s possible to write a Foldable instance
for `Prob`

and `Quantum`

such that
`any`

does what we want. I decided to not pursue this because
of the technical reasons with the `Ord`

instance and because
`Foldable`

is famously not the most lawful among the
typeclasses. Thus, I wasn’t expecting any great insight to come from
it.

It’s now time to goodbye. If you have any idea or insight about the above, or if you just like automata and want to discuss about this wonderful corner of mathematics, feel free to send me a mail!

## Acknowledgments

A lot of ideas on this post came after discussions with Alexandre Pierre.