Last semester, I was in a crypto class, so I found myself doing a lot of googling about primality testing and factoring integers. I stumbled upon an interesting regular expression in a StackOverflow answer. Let’s take a look at it.

`^1?$|^(11+?)\1+$`

This short regex snippet matches against strings containing *a
composite number of 1’s*. Surely, you have questions.
If you search the web for primality testing regexes, or this particular
regex, you will see that others had questions too. I found other
people’s explanations lacking, so I’ll offer my own.

This regular expression tries to factor the number of
`1`

’s in the string: if the only factors are zero, one, or
the number of `1`

’s, then it fails to match. It does this
through trial division. It tries dividing by one, then two, then three,
and so on, until it finds a match or it reaches the number itself.

Here’s a quick walkthrough with inputs `11111`

and
`111111`

(5 and 6 `1`

’s, respectively):

`^1?$`

This just matches against zero or one `1`

’s (since neither
are prime). This doesn’t match `11111`

.

`^(11+?)`

This is where the fun starts. This matches against one `1`

and as few more `1`

’s as possible to complete a match.
Initially, it just matches `11`

in both our inputs. Whatever
is matched is *captured* since we put parentheses around it.

`\1+$`

In case you are unfamiliar, `\1`

is called a
*backreference,* and it references back to the last thing
captured, the `11`

. So for `111111`

,
`^(11+?)\1+$`

matches! There is an initial `11`

plus two more `11`

s for a total of 6 `1`

’s. Do you
see what’s happening? The regex tried to divide 6 by 2 and succeeded, so
it returns a match.

This doesn’t match against `11111`

though:
`11 .. 11`

is missing a `1`

and
`11 .. 11 .. 11`

has too many `1`

’s. So now the
regex engine has to backtrack and try again.

This time `^(11+?)`

matches `111`

, but that’s
no good either: the only option is `111 .. 111`

which is too
many `1`

’s again. Next it matches `1111`

, but that
doesn’t work for the same reason.

Finally, the engine will give it one last shot by matching the whole
string: `11111`

, but since it can’t fit it more than once to
satisfy the `\1+`

bit, it gives up. No match for the prime
number.

A regular expression that performs trial division: now you can say
you’ve seen it all! This should go without saying, but *don’t use
this.* Primality testing is already easy,
there’s no need to repurpose regex engines to do it inefficiently. It’s
just a neat trick.