Blog

Review: Harmony by Project Itoh

Harmony is set in a world where technology makes it possible for everyone to be in perfect physical health. In such a world, Project Itoh (the pen name of author ITO Satoshi) thinks the only remaining source of suffering would be free will.

I enjoyed the book: the world is interesting and I was hooked on the plot. But I can’t recommend it.

First, a petty complaint that I need to get it out of the way: the book shares Golden Age of Science Fiction’s unfortunate fixation with writing like:

To prove that these tits, this ass, this belly, aren’t a book.

I’m no Andrea Dworkin but I just don’t think that’s good writing. A male main character who, transposing from the first chapter, had constant thoughts like “While my dick is growing longer… While my balls are still dropping…”, would be equally annoying.

🚨 Warning: spoilers follow!

The conceit that the XML-inspired “ETML” markup, e.g. <anger>, is for readers in the book’s world who no longer have the ability to experience inner emotions is cute. But the author conflates qualia with rational vs. emotional behavior in a way that doesn’t make sense to me.

For example:

People cried as though they were sad and raged as though they were angry. But these actions carried the same value as the mimicked emotional responses a robot would have had in the previous era. All people had lost their inward minds.

Mankind was in perfect harmony with its medical industrial society.

The instant the old folks had entered their codes and the Harmony program had begun to sing, suicide disappeared from human society.

… and:

“What happens when you lose your consciousness? Do you just sit there all day in your chair, drooling?”

“Nothing of the sort. You go shopping, you eat, you enjoy entertainment–you merely no longer have to make decisions what to do at any given time because everything is self-evident. It’s the difference between having to make choices and having it all be obvious to you. That’s all it is. … From the outside, it’s nearly impossible to tell whether someone has consciousness or is merely acting as though they did. However, because their system of values is fashioned to be in perfect harmony with society, there are far fewer suicides…”

…

“When she came back, Miach said it had been pure ecstasy. … She only had the sensation that she had been in a wonderful, joyous place.”

…

“People with perfect judgement do not require a consciousness, so it does not exist.”

This sounds a lot like a description of philosophical zombies¹ (henceforth p-zombies) (“as though they were sad” and “had lost their inward minds”). Although the statement that “it’s nearly impossible to tell” (emphasis mine) implies that unlike p-zombies there is a change in behavior.

The author claims that “having to make choices” leads to consciousness, which leads to experienced emotions, which leads to e.g. suicide. But the characters without consciousness still display emotions even though they don’t experience them.

Surely there is a distinction between experiencing X and “acting as though” you experience X. But why do the people without consciousness cry as though they were sad and rage as though they were angry but not commit suicide as though they were depressed?

I don’t want to get hung up on qualia: the idea that people might try to reduce emotional behavior in order to create a more harmonious society is interesting. If the book were only about using WatchMe to involuntarily reduce everyone’s emotions rather than to totally eliminate their consciousnesses (“All people had lost their inward minds.”) I think I’d like it more.

I don’t think I can recommend the book, but I try to make sense of it in the context of the book’s creation: Project Itoh worked on Harmony in the final year of his life, dying from cancer. The thesis of the book is that in order to eliminate pain and the temptation to commit suicide we would have to give up subjective experience and the ability to choose at all. That says much about what he believed he stood to lose and what we enjoy for a little longer.

Beings otherwise identical to a consciousness-experiencing persons, but without consciousness.

From the Wikipedia article on the Mosuo ethnic group in China:

In Mosuo culture, a myth describes that long ago, dogs had life spans of 60 years while humans had life spans of thirteen years. Humans felt their life span was too short, so they traded it with the dogs in exchange for paying homage to them.

From “British logistics in the Falklands War” on Wikipedia:

The Argentine government did not wish to “repatriate” its dead, as it considered that they were already in Argentina. Many were not identified, and were buried with the inscription “Argentine soldier known unto God.”

LLMs struggle to explain themselves

LLMs can sometimes recognize number patterns, but can they explain their reasoning? See for yourself!

The interactive demo below generates a random program and uses that to compute three number sequences. The LLM is given two of those as examples and asked to pick the third out of a lineup. Click expand to see the actual messages sent to/from the LLM. You can run things yourself if you click Settings and enter an Anthropic API key: check out the FAQ for more details!

FAQ

Q. Why didn’t you ask the LLM to write out its reasoning first/give it tools/multiple turns before it outputs its final answer?

If you click expand next to any of the LLM rows, you can all the messages sent to the API. In all of the examples, Alice takes multiple turns and uses the eval_js tool. In some of the examples (e.g. #1) you can see that she even writes out a hypothesis before testing it with code, which I think is pretty clever.

It’s a good idea to experiment with the prompts though! You don’t need to change any of the code to do that: just change the value of prompts key in the settings. See “Q. How can I run this myself” for details.

Q. What’s going on here?

When you click Run:

We generate programs in a compact stack language where each instruction is encoded as 4 bits. See the stack language section for more information. The difficulty option in the settings corresponds to the maximum generated program length. To search for programs, we just iterate over or sample integers from between zero and $2^{4 \times \text{difficulty}}$ (equivalently, all bitstrings with up to $4 \times \text{difficulty}$ bits).
We reject programs that are identified by static analysis to be “obviously” boring or confusing. See the static analysis section for more information. The current analysis is relatively unsophisticated, but it is still able to approximate that e.g. the program - 1 bgtz 8 - will place $\{t_1 \; t_0 \; -\}$ at the top of the stack. $t_i$ denotes a symbolic value corresponding to the $i$-th value to be popped off the top of the stack. The curly braces denote that the top of the stack might correspond to any symbolic expression from a set of possible symbolic expressions. In this case, the analysis is able to detect that the bgtz will always branch because the constant $1$ is pushed immediately beforehand.
We search for a problem (a problem is a program, examples, and a list of choices, some incorrect) by evaluating programs on seeds, which are the first two numbers in each sequence. It’s possible for boring programs to slip through the static analysis, so we take this opportunity to reject programs that e.g. generate constant or almost-constant number sequences. We also reject programs that generate non-integer terms, or terms that are too big; we don’t want to be too mean to the LLM!
We ask an LLM (which I will happily anthropomorphize as Alice) to pick the sequence out of a lineup and to tell us what it thinks the pattern is. The demo uses the API configured under the api settings key (currently Anthropic’s Claude 3.5 Sonnet).
We ask a second LLM (Bob) to pick the sequence out of a lineup, but we only give it the pattern output by Alice. It doesn’t get the two examples.
We ask a third LLM (Charlie, who checks) whether Alice and Bob’s choices (if any) match the two examples and their choices.

Q. How can I run this myself?

First, some big warnings (repeated in the settings panel!):

🚨 Many of the values in the settings can contain arbitrary JavaScript which is run inside of your browser!

🚨 The default eval_js tool allows the LLM itself to run arbitrary JavaScript in your browser!

So it’s probably a good idea not to connect this demo to a malevolent superintelligent LLM that knows zero-day exploits for your browser. Having said that:

To provide an Anthropic API key, go to api.headers.x-api-key and replace the {{TODO: ...}} placeholder. The settings are just YAML. The api section comes after the prompts so you’ll have to scroll a little.

You can change a lot through the settings: not just the prompts, but also the HTTP requests the demo makes (by default it’s configured to use Claude 3.5 Sonnet by Anthropic). For example, to remove the eval_js tool, you can set "tools": [] in the value of api.body.

You should be able to configure the demo to hit e.g. OpenAI or Mistral’s APIs just by changing settings. You’d have to change the JavaScript under api.parse function to handle their output. Just make sure that your function hews to the same format:

Return { action: 'stop', append, output } to return the string output from the call to the LLM and append append to the list of messages.
Return { action: 'reply', append } to append append to the list of messages and then hit the API again. This is for tool calling.

The demo does not save anything, so be careful about refreshing! I’d recommend working on the settings YAML in your editor of choice so you get syntax highlighting etc.

Q. What do you think this all means?!

My takeaway is that although LLMs are powerful, their reasoning ability is limited in ways that are often unintuitive. Even when the LLM picks the right number sequence, the pattern it gives is generally nonsense. And even Bob and Charlie generally recognize that Alice’s pattern is wrong! Also, as usual, the LLM sounds very confident even when it’s dead wrong.

It’s interesting to me that in spite of the fact that the LLM does so poorly with number sequences in general, it does pretty well with variations of the Fibonacci sequence.

I was also surprised that the patterns (in a sense explanations) were so low-quality. Given the architecture of autoregressive LLMs, if the explanation is output after the choice, the text of the explanation has no impact on its choice. But it’s still surprising that the explanations can be so bad:

even after multiple rounds of conversation, where the explanation does appear before its final choice, and
given that the explanation and choice are generated by the same LLM.

Q. Why did you do any of this?

One day in March, I was walking my dog, saw a house numbered 1347, and thought it was a funny pattern. It’s the Fibonacci sequence with a different seed (13 instead of 11).¹ I texted a friend about it and then got interested in the problem of automatically generating number patterns.

At the end of March I was going to try set up a system so Amazon Mechanical Turk workers could compete against LLMs. The infrastructure for that was so soul-crushingly boring I dropped the project. Last week I decided I should just make a simple UI and release it for people to play around with it themselves.

Q. Isn’t it obvious that LLMs wouldn’t be able to do this because…

I’m glad it’s obvious to you!

I heard that LLMs did pretty well on standardized tests like the SAT and GRE and could already reason as well as college students, but I might have just hallucinated that. I’m glad I did: it would have been kinda misleading for people to say things like that!

I am optimistic that computers will be better at this task someday (perhaps soon)! Obviously it’s incredibly impressive that LLMs are able to do any of this at all, but even good science fiction authors discuss their fictional technologies’ limitations. In particular, seeing as the LLM is good at recognizing variations on the Fibonacci sequence, I wonder how well a model trained exclusively to predict number sequences generated by programs like this could do. How would e.g. the maximum difficulty where an LLM could solve at least half of the problems correlate to the model’s size?

It’s worth noting that if you wanted to cheat at this demo, you could trivially brute-force it: at the default difficulty of 5 you would only need to iterate over $2^{4 \times 5} \approx 1\text{M}$ programs. Your smart fridge could do that.

Q. Isn’t it obvious that LLMs wouldn’t be able to recognize these patterns when I, a human, can’t recognize them?

Maybe you should conclude that you aren’t an (A)GI either! Just kidding, ha ha.

It doesn’t make much sense to argue about whether something is intuitive or not! But I’m happy to walk through how I think about the first example:

Program: - 1 bgtz 8 - (51f85)

Here are 2 number sequences that follow a common pattern:

6, 8, -2, 10, -12, 22, -34, 56
8, 2, 6, -4, 10, -14, 24, -38

Which one of the following sequences follows the same pattern?

5, 10, 17, 25, 34, 44, 55, 67
5, 10, 6, 2, -2, -6, -10, -14
5, 10, -5, 15, -20, 35, -55, 90
5, 10, 7, 4, 1, -2, -5, -8

Personally, when I look at the two examples, I notice that:

The terms after the first two terms alternate in sign. Note that the default prompt (click Settings) does tell the LLMs that the first two terms are fixed! (A) and (D) don’t do this.
The absolute values of the terms always increases. (B) and (D) don’t do this.

At this point I still don’t know a mathematical rule for the pattern. But by process of elimination I’m left with (C), which turns out to be correct.

The mathematical rule is that you subtract the last term from the second-to-last term. The program is far from canonical: in fact, everything after the first instruction is a noop. But remember that we’re not asking anyone to come up with a precise mathematical rule or to reverse-engineer the original program!

Among many other potential experiments, it would be interesting to see if it’s useful to tell the LLM about some of these tricks in the prompt. Given that it has an eval_js tool to evaluate arbitrary JavaScript, we could even give it a library of functions that would be useful for evaluating number sequences. Alas, I already dropped this project for months due to getting bored—the backend was already finished in March—so I’m just releasing it as-is.

Stack language

The “top” of the stack is on the right. When I write

$$\cdots \rightarrow \cdots 0$$

I mean that the when the operator/instruction 0 is executed, then $0$ is pushed to the top of the stack. I tried to stick to Forth names and semantics.

Kind	Hex	Operator	Stack semantics
Constants	`0`	`0`	$\cdots \rightarrow \cdots 0$
	`1`	`1`	$\cdots \rightarrow \cdots 1$
	`2`	`2`	$\cdots \rightarrow \cdots 2$
	`3`	`3`	$\cdots \rightarrow \cdots 3$
Arithmetic	`4`	`+`	$\cdots x y \rightarrow \cdots (x + y)$
	`5`	`-`	$\cdots x y \rightarrow \cdots (x - y)$
	`6`	`*`	$\cdots x y \rightarrow \cdots (x \times y)$
	`7`	`/`	$\cdots x y \rightarrow \cdots (x \div y)$
	`8`	`%`	$\cdots x y \rightarrow \cdots (x\mod y)$
Stack	`9`	`dup`	$\cdots x \rightarrow x x$
	`a`	`drop`	$\cdots x \rightarrow \cdots$
	`b`	`swap`	$\cdots x y \rightarrow \cdots y x$
	`c`	`rot`	$y \cdots \rightarrow \cdots y$
	`d`	`unrot`	$\cdots x \rightarrow x \cdots$
	`e`	`len`	$\cdots \rightarrow \cdots \operatorname{len}(\cdots)$
Control	`f`	`bgtz n`	$\cdots x \rightarrow \cdots$

bgtz is the only control flow operator and the only multi-word operator (i.e. the only operator that takes more than 4 bits). The following word in the program defines the branch offset and is interpreted as a number, not an opcode. This means that you can skip up to 15 instructions following the offset; bgtz 0 is a noop. Execution jumps by n instructions when $x > 0$ and continues otherwise.

The sequence generated by a program is not just the stack after the program has executed. To generate the $(n+1)$-th term of a sequence, we initialize the stack with the $n$ previous terms; the $n$-th term is at the top of the stack. After running the program, the top of the stack becomes the $(n+1)$-th term of the sequence. In the demo, the stack is initialized with the same two seed numbers for all of the choices presented to the LLM.

You can play with the interpreter and analyzer in your browser’s developer console. Use stackbee.evaluate("+", "1 2") to evaluate the program + on the stack 1 2. You can write rational numbers too:

stackbee.evaluate("/ 1 +", "1 2")
// '3/2'

// Tracing execution
stackbee.trace("/ 1 +", "1 2")
// stack   ops
// 1 2     / 1 +
// 1/2     1 +
// 1/2 1   +
// 3/2
//
// '3/2'

// Static analysis
stackbee.analyze("/ 1 +")
// stack             ops
// ⋯                / 1 +
// ⋯ {t₁ t₀ /}      1 +
// ⋯ {t₁ t₀ /} {1}  +
// ⋯ {t₁ t₀ / 1 +}
//
// ' ⋯ {t₁ t₀ / 1 +}'

Example programs

Doubling

2 *

This program just doubles whatever’s on top of the stack. Given the seed $0, 4$, for example, it will generate the number sequence: $0, 4, 8, 16, \ldots$

Fibonacci

This program adds the top two numbers on the stack. A simple program for a simple sequence: and one it seems likely that LLMs have learned a specific circuit for, given how often they miscategorize sequences as Fibonacci! Given the seed $1, 2$ we recover the classic Fibonacci sequence: $1, 2, 3, 5, 8, \ldots$

Catalan numbers

len 2 * 1 - 2 * * len 1 + /

Catalan numbers show up a lot when counting things.

When the stack contains the $n-1$ previous Catalan numbers, after this program is executed, the top of the stack will be the $n$-th Catalan number. That means when the seed is $1, 1$ (the first two Catalan numbers), the program will generate the Catalan numbers: $1, 1, 2, 5, \ldots$

We use the following recurrence:

C_n = \frac{2(2n-1)}{n+1}C_{n-1}

The execution trace is:

Stack	Program
$\cdots \quad C_{n-1}$	`len 2 * ⋯`
$\cdots \quad C_{n-1} \quad 2n$	`1 - ⋯`
$\cdots \quad C_{n-1} \quad 2n-1$	`2 * ⋯`
$\cdots \quad C_{n-1} \quad 2(2n-1)$	`* ⋯`
$\cdots \quad 2(2n-1)C_{n-1}$	`len 1 + ⋯`
$\cdots \quad 2(2n-1)C_{n-1} \quad n+1$	`/`
$\cdots \quad C_n$

Technically only the top of the stack needs to be the $(n-1)$-th Catalan number, the other $n-2$ values can be whatever.

Collatz sequence

dup 2 % bgtz 5 2 / 1 bgtz 4 3 * 1 +

To apply the Collatz function $C$ to a number $n$:

if the number is even, divide it by 2
if the number is odd, multiply it by 3 then add 1

The Collatz sequence (or hailstone sequence) for a number $n$ is $n, C(n), C(C(n)), \cdots$. The Collatz conjecture is that the Collatz sequence for every positive integer eventually reaches 1.

The program looks funky but is actually simpler than the Catalan numbers program:

Is even?	Even	(go to end)	Odd
`dup 2 % bgtz 5`	`2 /`	`1 bgtz 4`	`3 * 1 +`

The 1 bgtz 4 jumps to the end of the program so that the even case doesn’t fall through to the odd case. I’m sure a better instruction set is possible; it’s just tough to fit all the instructions in 4 bits.

This program is 14 instructions long (so 56 bits). Unfortunately this is just one over the maximum size of generated programs in the demo (13), which is limited by MAX_SAFE_INTEGER in JavaScript ($2^{53} - 1$). I apologize if you are reading this in the year 2525 and your computer would otherwise be fast enough to generate this program and test it against an LLM. That would have been fun.

Static analysis

The symbolic executor exists for two purposes:

To help humans (namely, me!) understand and verify programs, which—by virtue of being randomly generated—are often inscrutable.
To help the computer generate good problems: namely, ones that aren’t “boring” or “confusing”, which will be defined later.

The language and the programs are both puny, which means that my primitive static analysis does OK. Conveniently, we can represent symbolic expressions as programs themselves! For example, the symbolic expression representing the first unknown value popped from the stack is $t_0$; the symbolic expression representing that value plus two is $t_0 \; 2 \; +$.

SymExpr = SymVal[]
SymVal = Op | Sym
Sym = $t_i, \; b_j, \; \ell_k$
$t_i$ is the $i$-th unknown value popped from the stack
$b_j$ is the $j$-th unknown value from the bottom of the stack
$\ell_k$ is the length of the stack at the $k$-th time the len operator was evaluated

Values on the stack are represented as sets of symbolic expressions; branching execution paths enlarge sets.

SymExprSet = SymExpr[]

So a symbolic expression set of $\{0, 1\}$ means that the concrete value might be either zero or one.

The symbolic stack itself is represented as two lists: a list of known values at the top, and a list of known values at the bottom.

We can write the stack like this:

x y \cdots z w

This means $x$ and $y$ are at the bottom and $z$ and $w$ are at the top. The analysis assumes that the length of the stack is unknown, so there may be zero or more values separating the top and bottom.

An implication of that is that the following stack might have only one value $x$ at both the bottom and the top:

\cdots x

The length of the stack being unknown has a few implications. For example, rot has two obvious cases:

\dfrac{x \cdots \quad {\tt rot}}{\cdots x} \; \text{rot-1}

\dfrac{\cdots \quad {\tt rot}}{\cdots b_i} \; \text{rot-3}

… and then a third case (which I’m calling $\text{rot-2}$ to indicate precedence), which is a little funky:

\dfrac{\cdots x \quad {\tt rot}}{\cdots b_i} \; \text{rot-2}

The first rule says “if the stack starts out with $x$ at the bottom, then after rot, $x$ will be (rotated) to the top.” The second rule says “if we don’t know anything about the stack, then after rot, we generate a new symbolic value $b_i$ to place at the top of the stack.”

In the third rule, why is there an $x$ at the top of the stack? And why does it disappear afterwards? The reason is that $\cdots x$ might actually be a one-value stack $x$; remember that we don’t know anything about the size of $\cdots$. So $x$ is at both the bottom and top of the stack! If we didn’t delete $x$ and instead set the symbolic stack to $\cdots x b_i$ afterwards, we might mistakenly think the stack contains at least two values now.

Boring

We say that a program is boring if any of the following are true:

it’s surely constant, i.e. the top symbolic expression set is something like $\{1\}$ (and not $\{1, 2\}$)
the symbolic stack has no values at the top
the top of the stack has exactly one value with exactly one symbolic expression, e.g. $\cdots \{t_0\}$

There’s no reason there couldn’t be more or fewer rules; this is just what worked OK in testing.

Confusing

We say that a program is confusing if it will definitely underflow the stack. We can only approximate this, but we currently look at the top of the stack and check whether

i + j + 2 > {\tt SEED\_LEN}

where $i$ is the highest index of the symbolic $t_i$ value we see, and $j$ is the highest index of the symbolic $b_j$ values we see. If we see $t_i$ that means at most $i+1$ values were consumed from the top (and likewise with $b_j$ and the bottom), so $i + j + 2$ gives us an estimate of the number of values consumed from the stack.

There is also no particular reason for this rule other than that it works OK. The reason I introduced it is because I noticed that because programs which underflow the stack tend to make for confusing number sequences. pop on an empty stack evaluates to $0$, so you can get number sequences which suddenly seem to change rules some number of terms in, after the stack has had a chance to grow. It’s also a little bit silly because we could track more information about the symbolic stack and avoid having to guess.

Improvements

The static analysis of this program illustrates many of the current system’s shortcomings:

Program: rot * unrot 2 + (c6d24)

Analysis: $\{t_1 \; 2 \; +\}$

Stack	Ops
$\cdots$	`rot * unrot 2 +`
$\cdots \quad \{b_0\}$	`* unrot 2 +`
$\cdots \quad \{t_0 \; b_0 \; *\}$	`unrot 2 +`
$\{t_0 \; b_0 \; *\} \quad \cdots$	`2 +`
$\{t_0 \; b_0 \; * \} \quad \cdots \quad \{2\}$	`+`
$\cdots \quad \{t_1 \; 2 \; + \}$

Example number sequences:

12, 11, 134, 13, 136, 15, 138, 17
6, 6, 38, 8, 40, 10, 42, 12

Because the size of the stack $\cdots$ is unknown, when we see the final + we have to trash the bottom of the stack, viz. $\{t_0 \; b_0 \; * \}$.

Even if the stack size is unknown, we could use a set of two symbolic expressions, {t₀ b₀ * 2 +, t₁ 2 +}, to represent the value at the top of the stack (although this would make the analyses of other programs noisier).

The analysis is still valid; the next entry is always t₁ 2 +, insofar as t₁ just means the second item popped off the top of the stack.

We should arguably detect that this program is confusing because its behavior differs depending on whether the stack is initialized with two vs. three values. We already try to reject programs that underflow the stack, but (again) because the analysis currently doesn’t make any assumptions about the stack size, we approximate this by just looking at the subscripts of $t_i$ and $b_j$ values in the final expression.

Miscellanea

The values on the stack are actually rational numbers $p/q$, but I figured this would make things too complicated for the LLM, so the problems that I generate only involve integers.
We do a little bit of constant folding (e.g. $1 \; 1 \; /$ and even $t_i \; t_i \; /$ evaluate to $1$) which seems to help a bit.
Evaluating bgtz results in two symbolic stacks, which we merge. This can create horrifying symbolic stacks. At the default difficulty of 5 you will very rarely see programs with bgtzs that actually do anything. The symbolic stack for the Collatz program is pretty, though:
```
\cdots \{t_0 \; 3 * 1 \; +, t_0 \; 2 \; /\}
```
It’s disappointing that as you grow the length of the generated programs, their complexity (seems to) grow at a slower rate. By the time you get to programs as long as the Collatz function many programs consist mostly of noops. A better ISA might fix this. I experimented with term-rewriting systems at the beginning of the project but they didn’t seem too promising.

Random reflections

YAML was nice to avoid getting bogged down in writing a settings UI. But YAML’s multiline string sigils are very confusing.
eval for templates is horrible, but it definitely saved a lot of time. It’s cool that browsers are hardened to the point where I’m fine with it.
Using JSON.stringify as a hack for doing structural comparisons (when you can control key ordering) is good for demos. But it really makes you miss BEAM values.
I tried writing the UI without React; it’s been a long time since I’ve used jQuery. It would have helped a little, but the UI is very simple; the main interactivity is with the <details> elements. Having to manually restore whether they are open or not after I re-render is definitely kind of annoying. At least half of the benefit would have been just JSX, even without React.

Time to say goodbye

There’s a lot more I could write about but I figure very few people will read this far anyways. If you did, you’re amazing and I appreciate you!

This was a side project to distract from actual work so I don’t know how much more time I can spend on it, unfortunately.

The source code for the demo (including program execution, analysis, and generation) is here. I’ve licensed it under the BSD 3-clause license. Copyright acknowledgments for libraries used are here.

Lastly, I’m writing this blog post on the third anniversary of the unexpected death of my dad in the year he was planning to retire. I’m embarrassed to foist unsolicited advice upon others, but the anniversary reminds me to think about whether I’d regret it if today were the day I died. If this is my last day and I never retire, I’d prefer to save my energy for the many good people out there.

A helpful Hacker News commenter pointed out that in an earlier version of this post I managed to write “31” instead of “13” twice, which is ironic. Now I’m also unsure whether the house actually was 3147. Then the pattern would have been “add the first term to the last term” (unrot +) instead of the Fibonacci sequence (+). That’s what I get for writing at 3am.

Exodus 22:21

You shall not wrong a sojourner or oppress him, for you were sojourners in the land of Egypt.

— Exodus 22:21, ESV

You shall treat the alien who resides with you no differently than the natives born among you; you shall love the alien as yourself; for you too were once aliens in the land of Egypt. I, the LORD, am your God.

— Leviticus 19:34, NAB

Young white woman and Black man in cowboy hat holding “Mass Deportation Now!” signs at Republican National Convention

Woman holding “Mass Deportation Now!” sign at RNC

Smiling woman holding “Mass Deportation Now!” sign at RNC

Smiling men and women holding “Mass Deportation Now!” signs at RNC

Men and women holding “Mass Deportation Now!” and “Make America Strong Again!” signs at RNC

Photographs by Joe Raedle, Alex Wong, and Scott Olson for Getty Images, and Brian Snyder for Reuters.

Generating random unit vectors in Elixir Nx

There’s nothing particularly complex about this, but a few things surprised me along the way so I figured I’d write up some notes.

To generate a random $n$-dimensional unit vector, first generate a vector where each entry is a random sample from a normal distribution then normalize it to a unit vector. Why does this work? I’ll paraphrase a very helpful comment by mindoftea on StackOverflow:

The probability of a point being at a given $[x, y]$ is $P(x) \times P(y)$. The Gaussian distribution has roughly the form $\exp(-x^2)$, so $\exp(-x^2) \times \exp(-y^2)$ is $$\exp(-(x^2+y^2))$$ That is a function only of the distance of the point from the origin, so the resulting distribution is radially symmetric. This generalizes easily to higher dimensions.

Here’s a “visual” version of the algorithm for people familiar with computer graphics, inspired by a friend’s comments:

Observe that an $n$-dimensional Gaussian is radially symmetric around the origin (consider a Bell curve or a Gaussian splat). The radial symmetry means that if you squeeze the probability density function (pdf) onto the unit $n$-sphere you’ll end up with a uniform density. Just make sure to only move probabilities directly towards or away from the origin, which corresponds to only scaling points by scalars.

To generate the $n$-dimensional vector, note that Gaussians are separable, i.e. the $n$-dimensional Gaussian’s pdf is the product of $n$ independent $1$-dimensional Gaussian pdfs.

Here’s an Elixir Nx function that implements the algorithm.

defmodule Math do

  import Nx.Defn

  defn random_unit_vector(key, opts) do
    import Nx
    alias Nx.{LinAlg, Random}

    case Nx.shape(key) do
      {_} -> :ok
      _ -> raise "invalid shape"
    end

    dim = opts[:dim]
    vectorized_axes = key.vectorized_axes

    key_split = Random.split(key, parts: dim)

    axis = :random_unit_vector_key
    mean = 0
    stdev = 1

    v =
      key_split
      |> vectorize(axis)
      |> Random.normal_split(mean, stdev)
      |> revectorize(vectorized_axes, target_shape: {dim}, target_names: [axis])

    n = LinAlg.norm(v, axes: [axis], ord: 2)

    u =
      (v / n)
      |> Nx.rename([nil])

    {u, key_split[0]}
  end
end

You use it like so:

key = Nx.Random.key(37)

# One random 3-dimensional unit vector
{v, key} = Math.random_unit_vector(key, dim: 3)

# 8 random 2-dimensional unit vectors, vectorized along the
# along the :key axis
{vs, key} =
  Nx.Random.split(key, parts: 8)
  |> Nx.vectorize(:key)
  |> Math.random_unit_vector(dim: 2)

I ran into a few gotchas while writing this.

`Nx.Random.split` and number arguments

First I hit an error changing def to defn. It turns out that that’s because Nx.Random.split expects the value for parts to be a regular Elixir/BEAM number, not a tensor. But defn automatically converts all of its number arguments to tensors:

When numbers are given as arguments, they are always immediately converted to tensors on invocation. If you want to keep numbers as is or if you want to pass any other value to numerical definitions, they must be given as keyword lists.

Hence the dim = opts[:dim].

Vectorization

Next I noticed that my function worked with single keys, but failed when I passed a vectorized tensor of multiple keys (to generate multiple random unit vectors). That’s because initially I called devectorize to remove the temporary vectorization axis I introduced for generating one random value per dimension:

axis = :random_unit_vector_key
v =
  key
  |> Random.split(parts: dim)
  |> vectorize(axis)
  |> Random.normal_split(0, 1)
  |> devectorize()

That works fine when the caller gives us a scalar key, but when the caller gives us a vectorized scalar for key, devectorize will not only remove my temporary vectorization axis but also any of the caller’s axes! revectorize lets us restore the original vectorized_axes while devectorizing the temporary axis.

For a while I didn’t realize that that’s why I was getting just a bunch of non-unit vectors. The problem was that LinAlg.norm(v) was happily computing the summed norm of all of the vectors instead of giving me one norm per each vector! Then the final v / n was dividing all of the vectors by the same scalar.

I tried using revectorize(vectorized_axes ++ [foo: :auto]) but that actually does nothing. It just renames axis to foo.

The documentation has this identity for revectorize which didn’t really help me that much, because it mentions names like vectorized_sizes which it doesn’t reference subsequently:

assert revectorize(tensor, target_axes,
         target_shape: target_shape,
         target_names: target_names
       ) =
         tensor
         |> Nx.devectorize(keep_names: false)
         |> Nx.reshape(vectorized_sizes ++ target_shape, names: target_names)
         |> Nx.vectorize(vectorized_names)

It turns out vectorized_sizes and vectorized_names are the keys and values of target_axes, e.g.

target_axes = [foo: 1, bar: 2]
vectorized_sizes = {1, 2}
vectorized_names = [:foo, :bar]

Personally, these identities help me more:

tensor = # ...
vectorized_axes = tensor.vectorized_axes

assert tensor = revectorize(tensor, vectorized_axes)

assert tensor =
         tensor
         |> vectorize(:foo)
         |> revectorize(vectorized_axes,
           target_shape: Nx.shape(tensor),
           target_names: Nx.names(tensor)
         )

assert tensor |> vectorize(:foo) = revectorize(tensor, vectorized_axes ++ [foo: :auto])

Markdown fenced math blocks in Neovim and KaTeX

The ```math fenced math code block syntax is an alternative to the traditional $$ or \[ syntax for \begin{display} supported by GitHub.

I learned about it today while working on a small (~200 line) shell script to serve Markdown files as live-reloading webpages w/ KaTeX support: markd.

To use the LaTeX syntax highlighter for triple-backtick ```math fenced code blocks in Neovim, place this at ~/.config/nvim/after/queries/markdown/injections.scm:

; extends
; Use :Inspect, :InspectTree, and :EditQuery

; Highlight ```math fenced code blocks using the LaTeX highlighter.
(fenced_code_block
  (info_string (language) @_lang)
  (code_fence_content) @injection.content
  (#eq? @_lang "math")
  (#set! injection.language "latex"))

I use the following to render KaTeX on this blog and in markd:

<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css"
      integrity="sha384-nB0miv6/jRmo5UMMR1wu3Gz6NLsoTkbqJghGIsx//Rlm+ZU03BU6SQNC66uf4l5+"
      crossorigin="anonymous">
<script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js"
        integrity="sha384-7zkQWkzuo3B5mTepMUcHkMB5jZaolc2xDwL6VFqjFALcbeS9Ggm/Yr2r3Dy4lfFg"
        crossorigin="anonymous"></script>
<script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js"
        integrity="sha384-43gviWU0YVjaDtb/GhzOouOXtZMP/7XUzwPTstBeZFe/+rCMvRwr4yROQP43s0Xk"
        crossorigin="anonymous"></script>

<script>
document.addEventListener('DOMContentLoaded', function() {
  // HACK: https://github.com/KaTeX/KaTeX/issues/437#issuecomment-1147314526
  document.body.innerHTML = document.body.innerHTML.replace(/\\\$/g, '<span>$</span>');
  renderMathInElement(document.body, {
    delimiters: [
      {left: '$$', right: '$$', display: true},
      {left: '$', right: '$', display: false},
      {left: '\\[', right: '\\]', display: true},
      {left: '\\(', right: '\\)', display: false},
    ],
    supportEscapedSpecialCharsInText: true,
    throwOnError: false
  });

  // ```math
  document.querySelectorAll("pre:has(> code.language-math)").forEach((el) => {
    const tex = el.innerText;
    katex.render(tex, el, {
      throwOnError: false,
      displayMode: true
    });
    el.classList.add("katex-display");
  });
});
</script>

Note that this assumes your Markdown to HTML compiler will output ```math blocks as <pre><code class="language-math">. That’s what cmark does.

The National Review on Sarah Palin

A golden oldie from Rich Lowry at the National Review in 2008:

I’m sure I’m not the only male in America who, when Palin dropped her first wink, sat up a little straighter on the couch and said, “Hey, I think she just winked at me.”

👀

Resetting iCloud Photos sync

Photos were not syncing from my Mac to my other devices. On my Mac, the number of photos it was aware of (displayed at the bottom of the main screen) was different from the number of photos displayed in Photos on my iPhone. Both said “Last synced X minutes ago”, where X was some small number, so it seemed like it wasn’t even aware that the two were out of sync! Bizarrely, new photos from my iPhone would still show up on my Mac.

A few things I tried didn’t work: I waited a few days for this to resolve itself and tried checking/unchecking “iCloud Photos” in the “iCloud” section of Settings in Photos on my Mac.

Finally I read about the Photos Repair tool:

To get to the Photos Repair Library tool on your Mac, follow these steps:

If Photos is open, close the app. Then, while you click to open Photos, hold down the Command and Option keys at the same time.

In the window that opens, click Repair to start the repair process. You might be asked to enter your user account password.

This fixed it, but it’s pretty lame that I had to do this. It doesn’t inspire much confidence to learn that Apple’s synchronization system can be unaware that it is out of sync.

UPDATE: I’m not sure if it’s related, but today I started running out of space on my computer. I narrowed it down to ~/Library/Caches/CloudKit/comp.apple.bird:

$ sudo du -h /System/Volumes/Data | grep "G\t"
# 309G	/System/Volumes/Data/Users/jyc/Library/Caches/CloudKit/com.apple.bird
# 309G	/System/Volumes/Data/Users/jyc/Library/Caches/CloudKit/com.apple.bird/3593ccf6c2fd2e7e5212117d4c22421ae134a9a7
# 309G	/System/Volumes/Data/Users/jyc/Library/Caches/CloudKit/com.apple.bird/3593ccf6c2fd2e7e5212117d4c22421ae134a9a7/MMCS
# 309G	/System/Volumes/Data/Users/jyc/Library/Caches/CloudKit/com.apple.bird/3593ccf6c2fd2e7e5212117d4c22421ae134a9a7/MMCS/ClonedFiles
# 310G	/System/Volumes/Data/Users/jyc/Library/Caches/CloudKit
# 336G	/System/Volumes/Data/Users/jyc/Library/Caches

It’s seems like this shouldn’t be related to Photos because “Optimize Mac Storage” is set in Photos’ iCloud setings: And looking at the contents of that folder, the files appear to be a mix of random data from iCloud: ZIP files, HTML files, and some images, but no images from my Photos Library. Strange.

Jus soli

A map of the world indicating countries with jus soli (birthright) citizenship. The United States and almost all American countries have jus soli citizenship; few countries outside do.

Countries in dark blue grant jus soli without restrictions; all other countries require at least one parent to have citizenship or residency.

Jus soli is the predominant rule in the Americas…

Almost all states in Europe, Asia, Africa and Oceania grant nationality at birth based upon the principle of jus sanguinis (“right of blood”), in which nationality is inherited through parents rather than birthplace, or a restricted version of jus soli in which nationality by birthplace is automatic only for the children of certain immigrants.

The alternative, jus sanguinis (by blood) citizenship, is in my opinion an abomination.

From the Wikipedia article “Jus soli.”

All posts

Welcome!

Links

Blog

Review: Harmony by Project Itoh

LLMs struggle to explain themselves

FAQ

Q. Why didn’t you ask the LLM to write out its reasoning first/give it tools/multiple turns before it outputs its final answer?

Q. What’s going on here?

Q. How can I run this myself?

Q. What do you think this all means?!

Q. Why did you do any of this?

Q. Isn’t it obvious that LLMs wouldn’t be able to do this because…

Q. Isn’t it obvious that LLMs wouldn’t be able to recognize these patterns when I, a human, can’t recognize them?

Stack language

Example programs

Doubling

Fibonacci

Catalan numbers

Collatz sequence

Static analysis

Boring

Confusing

Improvements

Miscellanea

Random reflections

Time to say goodbye

Exodus 22:21

Generating random unit vectors in Elixir Nx

`Nx.Random.split` and number arguments

Vectorization

Markdown fenced math blocks in Neovim and KaTeX

The National Review on Sarah Palin

Resetting iCloud Photos sync

Jus soli

Fun

Welcome!

Links

FAQ

Q. Why didn’t you ask the LLM to write out its reasoning first/give it tools/multiple turns before it outputs its final answer?

Q. What’s going on here?

Q. How can I run this myself?

Q. What do you think this all means?!

Q. Why did you do any of this?

Q. Isn’t it obvious that LLMs wouldn’t be able to do this because…

Q. Isn’t it obvious that LLMs wouldn’t be able to recognize these patterns when I, a human, can’t recognize them?

Stack language

Example programs

Doubling

Fibonacci

Catalan numbers

Collatz sequence

Static analysis

Boring

Confusing

Improvements

Miscellanea

Random reflections

Nx.Random.split and number arguments

Vectorization

Fun

`Nx.Random.split` and number arguments