Finite Automata as Quantum Tensor Networks

14 May 2025

You may have noticed from my previous posts that I am really into finite automata (FA). They exist in an intersection between languages and controllable systems that is simply awesome. So, as you may have expected from the title, this is yet another post about them. Our plan today is to take a look at quantum systems and circuits that measure whether a finite automaton accepts a fixed-size input string.

Quantum states are notoriously hard to represent in a classical computer. For example, a general quantum computing system with N qubits requires 2^N complex coefficients, which is prohibitively large for even moderately sized N. Thankfully, many of systems that one may encounter in the real world are describable with much less information. As we will see, for N characters and an automaton with A symbols and S states, the relevant quantum systems requires only O(N A S^2) coefficients to represent. That’s an exponential improvement compared to the general case.

In the course of this post, we also explore different forms of representing FAs. Namely, as vectors, tensor networks and quantum circuits. Also, besides the obvious prerequisites on tensors and automata, a bit of familiarity with Bra-ket notation and String Diagrams will be useful for understanding this post. Nevertheless, I’ll try to introduce any new concept as needed. Please drop a message if you find anything to be missing or confusing!

Vector Spaces for Automata

Start by fixing a (nondeterministic) finite automaton with

Finite state set \mathcal{S} with S elements;
Finite alphabet set \mathcal{A} with A elements;
Initial state s_0 \in \mathcal{S};
Accepting states \mathcal{F}\in \mathcal{P}\mathcal{S};
Transition function t \colon \mathcal{A}\times \mathcal{S}\to \mathcal{P}\mathcal{S}.

Also, for practicality, let’s name a function \mathtt{match}\colon \mathcal{A}^\star \to \{0, 1\} that checks whether the automaton recognizes a string. We write brackets \llbracket \text{--} \rrbracket \colon \mathtt{Bool} \to \mathbb{C} for using truth values as the complex numbers 0 and 1.

How to represent this system on finite sets using only linear algebra? The translation to vectors and matrices makes use of the free vector space, which you can think of as complex-weighted superpositions of elements of X.

For a finite set X, its free vector space, denoted as \mathbb{C}^X, is the space of complex-valued functions X \to \mathbb{C}.¹

Since this post’s theme is quantum mechanics, let’s use some of the field’s notation. We denote vectors inside funny triangles called kets, such as \ket{\psi} \in \mathbb{C}^X. Also, elements of the dual have the triangles inverted in a bra: \bra{\phi} \colon \mathbb{C}^X \to \mathbb{C}. Joining them into a bra-ket \braket{\phi|\psi} amounts to function application and computes a complex number. Also, yes, the pun is intended. Those physicists…

For each element x \in X, there is an indicator function \ket{x} and, as expected, these form an orthonormal basis of \mathbb{C}^X.² That is, there are c_x \in \mathbb{C} such that for any vector \ket{\psi} = \sum_{x \in X} c_x \ket{x}.

The Brute-Force Approach

For the automaton, the states and alphabet turn into vector spaces \mathbb{C}^\mathcal{S} and \mathbb{C}^\mathcal{A}. Let’s make our problem statement precise in this language. For N \in \mathbb{N}, the tensor product \bigotimes_{i=1}^N \mathbb{C}^\mathcal{A} is a vector space with N-length strings as basis. Does it have a subspace spanned only by recognized functions? If so, its projection would be a linear functional \bra{\mathtt{match}_N} \colon \bigotimes_{i=1}^N \mathbb{C}^\mathcal{A}\to \mathbb{C} which is non-zero only on the span of matched strings.

In theory, it is simple to construct such functional. Just define \bra{\mathtt{match}_N} as the matched set’s indicator: \bra{\mathtt{match}_N} \coloneqq \sum_{\sigma \in \mathcal{A}^N} \mathtt{match}(\sigma) \bra{\sigma}.

Although this is enough for an existence proof, actually constructing \bra{\mathtt{match}_N} requires the problem to be already solved. Furthermore, this is computationally expensive, requiring A^N complex coefficients. Using Tensor Networks, this amounts to saying that we only know a black box for this tensor.

We can do much better than this though. To be fair, this is among this post’s main points and deserves to be enshrined as a theorem.

Given a (nondeterministic) finite automaton with A symbols and S states, there is a tensor network requiring only O(NAS^2) coefficients to represent that determines whether the FA accepts any string with length N.

To find such a compact formulation, we need to use more of the automaton’s structure.

A Simple Example: Divisiblity by 3

Alright, this is all too abstract, so let’s apply this formulation to a simple automaton and see what happens. Our object of study is a machine recognizing whether a binary string represents an integer divisible by three. The alphabet is binary \mathcal{A}= \{0, 1\} while the states are the possible remainders \mathcal{S}= \{s_0, s_1, s_2\}. It is represented in the diagram below.

Let’s construct \bra{\mathtt{match}_4} from its truth table. For each bitstring, we check whether it is accepted and use that as component for the tensor. \begin{array}{rcl|rcl|rcl|rcl} 0000 &\to& 0 & 0100 &\to& 0 & 1000 &\to& 0 & 1100 &\to& 1 \\ 0001 &\to& 0 & 0101 &\to& 0 & 1001 &\to& 1 & 1101 &\to& 0 \\ 0010 &\to& 0 & 0110 &\to& 1 & 1010 &\to& 0 & 1110 &\to& 0 \\ 0011 &\to& 1 & 0111 &\to& 0 & 1011 &\to& 0 & 1111 &\to& 1 \end{array}

Finally, construct the matcher with the table above as coefficients. Omitting zeros, it is \bra{\mathtt{match}_N} = \bra{0011} + \bra{0110} + \bra{1001} + \bra{1100} + \bra{1111}.

I suppose you can see how badly this process grows with the string length.

The Vector Dynamics Approach

Since, we’ve already discussed the dynamics of FA in a previous post, I won’t spend much time on it. What’s important is that these machines follow their transition t to check if a string takes it from s_0 to a state in \mathcal{F}. Let’s put it into vector form.

The initial state is simple: it becomes the vector \ket{s_0} \in C^\mathcal{S}. For vectorial versions of the accepting states and transition function, let’s employ the useful trick of looking at subsets as binary functions: \mathcal{P}{X} \cong \{0, 1\}^X \subseteq \mathbb{C}^X. This way, subsets become superpositions where each element is either present or not, i.e., vectors with only binary components. In particular, the accepting set turns into \ket{\mathcal{F}} \coloneqq \sum_{f \in \mathcal{F}} \ket{f} \in \mathbb{C}^\mathcal{S}.

So we encounter an interesting characteristic of the vectorial view: it unifies elements and subsets—or analogously, determinism and nondeterminism. Moreover, checking if the automaton accepts a state s \in \mathcal{S} becomes an inner product, because accepted states have a nonzero projection into \ket{\mathcal{F}}: \braket{\mathcal{F}| s} = \llbracket x \in \mathcal{F} \rrbracket.

How cool is that? Keep this at the back of your head while we turn our attention to the transition. The previous isomorphism and some currying on its type yields \begin{array}{rcl} && \mathcal{A}\times \mathcal{S}\to \mathcal{P}\mathcal{S}\\ &\cong& \mathcal{A}\times \mathcal{S}\to (\mathcal{S}\to \{0, 1\})\\ &\cong& \mathcal{A}\to (\mathcal{S}\to (\mathcal{S}\to \{0, 1\})) \\ &\cong& \mathcal{A}\to (\mathcal{S}\times \mathcal{S}\to \{0, 1\}) \\ &\subseteq& \mathcal{A}\to (\mathcal{S}\times \mathcal{S}\to \mathbb{C}) \\ &\cong& \mathcal{A}\to \mathbb{C}^{\mathcal{S}\times \mathcal{S}} \\ &\cong& \mathcal{A}\to (\mathbb{C}^{\mathcal{S}} \overset{\text{Linear}}{\to} \mathbb{C}^\mathcal{S}) \end{array}

Although this seems like a lot of steps, they are mostly bookkeeping. The important part is that the transition t becomes a family of matrices indexed by \mathcal{A}. This way, we define the linear operators T^{\alpha} \coloneqq \sum_{s, s' \in \mathcal{S}} \llbracket s' \in t(\alpha, s) \rrbracket\ket{s'}\bra{s},\, \text{ for } \alpha \in \mathcal{A}.

For a state s, this represents the transition s \xrightarrow{\alpha} s' as \ket{s'} = T^\alpha \ket{s}. Even when s is nondeterministic (a subset instead of an element), linearity guarantees that the transition matrices act as they should. Thus, for a finite string³ \sigma = \alpha \cdots \omega \in \mathcal{A}^\star, we construct a matrix T^\sigma \coloneqq T^\omega \cdots T^\alpha which takes a state through the string’s dynamics. To check whether the automaton accepts \sigma, we start at \ket{s_0}, apply each T^\alpha and check the projection onto \bra{\mathcal{F}}: \mathtt{match}(\alpha \cdots \omega) = \llbracket \braket{\mathcal{F}| T^\omega \cdots T^\alpha | s_0} \neq 0 \rrbracket.

As a final treat, notice that the only property of nondeterminism that we actually use is that \mathcal{P}\mathcal{S} is representable as complex functions. Thus, the above discussion still works for other kind of automata with this property such as deterministic, probabilistic or quantum.

Back to Our Example

Recall our state machine recognizing whether a binary string represents an integer divisible by three. It had alphabet \mathcal{A}= \{0, 1\} and states \mathcal{S}= \{0, 1, 2\}. In vector form, it starts at \ket{0} and accepts any string also finishing at \ket{0}. We can extract the transition matrices by looking at the FA’s edges,

T^0 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix} ,\quad T^1 = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}

Now, this is all the data we need to run the system. As an example, the string 0110 represents 6 and is accepted because

\begin{array}{rcl} && \Braket{0 | \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix} | 0} \\ &=& \braket{0 | 0} \\ &=& 1 \end{array}

The Tensor Network Approach

Our vector discussion was well and good but it still lacks something. Since the strings in \mathcal{A}^\star had to be fixed beforehand, the dynamics are not linear on them. We made an open system parameterized on s_0 and \mathcal{F} while we want those to remain constant with varying input strings.

You can think of a tensor network as a graph of lazily evaluated tensors.⁴ They are in one-to-one correspondence with Einstein summation notation since instead of writing a tensor as its components, they represent it as the result of contracting other tensors. Previously, we represented the automaton as an inner product \braket{\mathcal{F}| T^\omega \cdots T^\alpha | s_0}. By looking at all its elements as tensors, we find a tensor network.

This is a “closed system” inasmuch as there are no free wires to connect an external input. The string \sigma is fixed and this network represents a single complex number. How can we make this system linear on \sigma? The answer is to look again at the transition function’s type but smartly uncurry it as a 3-tensor,

\begin{array}{rcl} && \mathcal{A}\times \mathcal{S}\to \mathcal{P}\mathcal{S}\\ &\cong& \mathcal{A}\times \mathcal{S}\times \mathcal{S}\to \{0, 1\} \\ &\subseteq& \mathcal{A}\times \mathcal{S}\times \mathcal{S}\to \mathbb{C}\\ &\cong& \mathbb{C}^\mathcal{A}\otimes \mathbb{C}^\mathcal{S}\otimes \mathbb{C}^\mathcal{S} \end{array}

Let’s call this tensor T. Notice that contracting it with pure characters gives rise to our previous matrices: T^\alpha = T \ket{\alpha}, and we get a linear dependence on the input string. Substitute this into our previous tensor network.

By isolating the part of the network that does not depend on the tensor product of characters \bigotimes \ket{\alpha_i}, we arrive at an expression for the matching tensor.

This kind of tensor network is known as a Matrix Product State (MPS) among physicists and as a Tensor Train among numerical analysts. Although I’m quite a fan of trains, let’s go with MPS for this post. They are a very compact representation for an N-tensor. In our application, there are N copies of T—each requiring A S^2 coefficients—plus the S coefficients for each of \bra{\mathcal{F}} and \bra{s_0}. Therefore, we got down from the exponential A^N to only NAS^2 + 2S degrees of freedom. By precontracting T with the initial and final states, we can reduce it even further to NAS^2 - 2AS(S-1).

MPS also have very fast contraction algorithms, requiring only NAS^3 operations for calculating an inner product.

Before closing this section, notice that the MPS view “loses” the automaton’s dynamic nature. There is no more order of contractions, just linked tensors that statically represent the whole process.

Turning into a Quantum Circuit

Tensor networks are great for quantum simulations, but what if we want to represent our automaton on an actual quantum computer? We will need to represent it as a quantum circuit (QC). For this post’s purposes, a QC is a tensor network where all tensors are unitary. You can then employ the usual methods to turn those gates (unitaries) into Pauli matrices or whatever. Since it is a single-sided circuit, we will use \ket{0} as its input. Fortunately for us, MPS have a property called gauge freedom which allow them to be rewritten using only isometries. With this, it is possible to rewrite any MPS as a quantum circuit!

Our approach follows Erik J. Gustafson et al.⁵ and William Huggins et al.⁶ and uses the QR decomposition to obtain unitaries. The only hypothesis we need is that S = A^k, making \mathbb{C}^\mathcal{S}= \bigotimes_{i=1}^k \mathbb{C}^\mathcal{A}. In general, we are working with qubits (so A = 2) and all gates should be between them, so the states should match this. If your FA does not match, just pad the state vectors and matrices with zeros. To unclutter the notation, let’s sometimes write how many tensor products each wire represents.

Throughout this section, we put the following lemma to good use. It is a standard linear algebra construction.

For an isometry V \colon \mathbb{C}^m \to \mathbb{C}^n with n = l m, there is a unitary matrix U \colon \mathbb{C}^l \otimes \mathbb{C}^m \to \mathbb{C}^n such that U(\ket{\psi} \otimes \ket{0}) = V \ket{\psi}.

Or in tensor networks,

Now, to decompose some states! First of all, a quantum circuit must be normalized. A global complex constant does not change its properties as a quantum state, so let’s work with \ket{\psi} = \frac{1}{\left\lVert\mathtt{match}_N\right\rVert_2} \ket{\mathtt{match}_N}.

The next step is to use the aforementioned gauge invariance to rewrite \ket{\psi} as an equivalent MPS where all tensors but the last are isometries. We achieve this orthogonalization process via a series of QR decompositions.⁷

For this next part, let’s view the first tensor as 3-tensor with incoming rank 0.

We will repeated the same process below for every node but the last one. Start by taking a tensor on the MPS and reshaping it into a A^{(l+1)} \times A^k matrix.

If l < k, apply a QR factorization to turn this matrix into a product between a unitary U and a triangular matrix R. From that and some reshaping, we get

When l = k, perform a thin QR decomposition instead to turn the tensor into a product between an isometry V and a triangular R.

With the aforestated lemma, this isometry can be dilated into a unitary U contracted with \ket{0} \in \mathbb{C}^\mathcal{A}. Doing that and reshaping again.

To orthogonalize a MPS, repeat this procedure for each tensor keeping the unitary and contracting R with the next site. Notice that l starts at zero and increases at every step until it equals the bond dimension k.

In Julia pseudocode, this process looks like this.

function orthogonalize!(psi::MPS) :: MPS
  N = length(psi)   # How many tensors in psi

  for i in 1:(N-1)
    if psi[i].prev < psi[i].next   # Compare the left dim with the right dim
      Q, R   = qr(psi)
      psi[i] = dilate(Q)   # Turn into unitary with |0>
    else
      Q, R   = thin_qr(psi)
      psi[i] = Q
    end

    psi[i+1] = contract(R, psi[i+1])
  end

  return psi
end

After orthogonalization, All tensors in the MPS become isometries except for the last one. This MPS represents the same tensor but is much more structured.

Now, what do we do with this last tensor? Remember that the first step was normalizing the MPS such that \braket{\psi|\psi} = 1. This is what is going to save the day! After orthogonalizing, the normalization “moves” to just the last tensor. To see that, build the inner product.

Notice that being an isometry in this context means that

With the first unitary U_1 satisfying the same without the arch to the left. By iterating this equation, the inner product is left as

We conclude that T is normalized when viewed as a vector in \mathbb{C}^\mathcal{S}\otimes \mathbb{C}^\mathcal{A}. What is nice about that? Well, a normalized vector is equivalent to a 1-column isometry in \mathbb{C}\to \mathbb{C}^\mathcal{S}\otimes \mathbb{C}^\mathcal{A}, so we can complete it to a unitary by plugging k+1 kets to it!

Putting it all together, the new MPS takes the following form:

In case it is not clear why the MPS above is a quantum circuit, worry not. All it takes is rotating it to the horizontal and sliding the boxes. Here’s an example with N = 5 and k = 2 where each wire is one qubit.

Now that’s a quantum circuit!

Conclusion

Well, we saw today that finite automata have a neat description as quantum systems. We can cast them as Matrix Product States to perform classical simulations or even as quantum circuits if you want to run in the real thing. This representation is just the tip of the iceberg though!

You can play with your FAs by subjecting them to anything a quantum system would. For example, MPS can be viewed as Born machines allowing you to extract independent samples of finite strings from the automaton. You can also use low-rank approximations (such as SVD truncation) to compress these states while still being similar to the original FA.

The sky is the limit! Have fun with this new tool.

You can also use formal linear combinations of elements of X, and everything works the same. For finite sets, they are isomorphic.↩︎
In the usual mathematical notation, we generally write the vector \ket{x} as e_x or even simply x.↩︎
The notation X^* denotes the Kleene Star, not the dual vector space. Math likes to reuse symbols, I know.↩︎
For my categorist readers: They are exactly the string diagrams in the category of \mathtt{FinVect}(\mathbb{C}) of finite dimensional complex vector spaces.↩︎
“Preparing Quantum Many-Body Scar States on Quantum Computers,” Quantum 7 (November 2023): sec. 4.1, https://doi.org/10.22331/q-2023-11-07-1171.↩︎
“Towards Quantum Machine Learning with Tensor Networks,” Quantum Science and Technology 4, no. 2 (January 2019): sec. IV.C, https://doi.org/10.1088/2058-9565/aaea94.↩︎
In the literature, people generally use the SVD for this step because it allows truncating the inner dimensions optimally. Since we are interested on the exact quantum circuit, both SVD and QR work.↩︎