Welcome to my website! I’m Jonathan Y. Chan (jyc, jonathanyc, 陳樂恩, or 은총), a 🐩 Yeti fan, 🇺🇸 American, and 🐻 Californian, living in 🌁 San Francisco: the most beautiful city in the greatest country in the world. My mom is from Korea and my dad was from Hong Kong. I am a Christian. My professional endeavors include:

a failed startup (co-founded, YC W23) that let you write Excel VLOOKUPs on billions of rows;

Parlan was a spreadsheet with an interface and formula language that looked just like Excel. Under the hood, it compiled formulas to SQL then evaluated them like Spark RDDs. Alas, a former manager’s prophecy about why startups fail proved prescient…

3D maps at Apple, where I did the math and encoding for the Arctic and Antarctic;

I also helped out with things like trees, road markings, paths, and lines of latitude!

various tasks at Figma, which had 24 engineers when I joined;

… including copy-paste, a high-fidelity PDF exporter, text layout, scene graph code(gen), and putting fig-foot in your .fig files—while deleting more code than I added!

Blog

Attaching Livebook and IEx to a running Phoenix instance

I run Phoenix (Rails-like web framework for Elixir) like so:

elixir --name [email protected] --cookie bar --erl \"-elixir ansi_enabled true\" -S mix phx.server

… then connect IEx (the Elixir REPL) using:

random=$(head /dev/urandom | LC_ALL=C tr -dc A-Za-z0-9 | head -c 8)
iex --name "iex-$random" --remsh [email protected] --cookie bar

… and connect Livebook (Jupyter for Elixir) using:

export LIVEBOOK_DEFAULT_RUNTIME=attached:[email protected]:bar
livebook server @home

The server node (Erlang VM instance) has the name [email protected], and IEx nodes have names like [email protected].

Now I can recompile code in the server node (which is the same node that Livebook is attached to) by entering recompile in IEx. Sometimes it looks like Phoenix.CodeReloader even reloads the code automatically, so recompile evaluates to :noop, which is nice! This means I can use Livebook to test out server functions as I iterate on them.

I tried running Livebook in its own node for a bit and including server code using:

Mix.install(
  [
    {:foo, path: "..."}
  ],
  config_path: ".../config.exs"
)

This also works alright, but it’s nice to be connected to the same server node because you can e.g. look at the state of GenServer processes with :sys.get_state.

Setting up Plex on a Synology NAS with ZeroTier

  1. Go to http://plex.tv/claim/ to create a “claim code”.
  2. In Package Center, search for and install “Plex Media Server.”
  3. Make sure to select the “Claim” option when installing, not the default option! Enter the claim code you generated in step 1. Otherwise you’ll have to reinstall; I was able to sign in to my Plex account but then kept getting the error “Not authorized: you do not have access to this server”.
  4. In File Station, right-click the volume(s) you want Plex to read, click “Properties”, “Permission”, “Create”, then enter “PlexMediaServer” under “User or group” and check “Read” and “Write”.
  5. In Plex, click the settings wrench icon at the top-right, go to Manage > Libraries, click “Add Library”, then select the volume(s). In my case they were under “volume1”.
  6. In Plex, go to Settings > Network and click “Show Advanced.”
    • For “Preferred network interface”, select your ZeroTier network interface (e.g. ztabcdef (10.147.17.123)).
    • For “LAN Networks”, enter “192.168.0.0/16,10.0.0.0/8”.
    • For “List of IP addresses and networks that are allowed without auth,” enter the same.

You should be all set! Now you can stream directly from your NAS even when away from home, and you don’t have to expose anything to the public Internet.

1960s-era encryption systems often included a punched card reader for loading keys. The mechanism would automatically cut the card in half when the card was removed, preventing its reuse.

From “Securing Record Communications: The TSEC/KW-26” via “Stream cipher attacks” on Wikipedia.

How to reinstall a Homebrew package from main/master

Here’s what I use to reinstall the main branch of the neovim Homebrew package:

brew unlink neovim && brew install neovim --HEAD --fetch-HEAD

My searches weren’t turning up good results.

Toasty Tech GUI Gallery

Ran across the Toasty Tech GUI Gallery today. I assumed it was abandoned because of the pristine “IE is EVIL!!!” banner until I noticed the Windows 11 review:

A screenshot of the Toasty Tech GUI Gallery homepage.

Don’t miss the “screen shots” of the “Realworld Desk” GUI. Awesome website.

From the Wikipedia article on Hattori Hanzo:

He died at the age of 54 or 55 in 1597. There are three theories about his death. One asserts that he was assassinated by a rival Samurai, the pirate Fūma Kotarō. After Hanzo tracked him down to the Inland Sea, Kotarō lured him and his men into a small channel and used oil to set the channel on fire. The second theory is that Hanzo became a monk in Edo where he lived out the rest of his days until he died of illness. The third theory is that he died because of illness and it was a natural death.

Backing up iCloud Photos using rsync

Here’s the copy-icloud-photos script I use to backup my photos stored on iCloud to my Synology NAS:

#!/bin/bash
set -euo pipefail

args=(
  --delete
  --human-readable
  --no-perms
  --partial
  --progress
  --times
  -v
)

src="/Users/jyc/Pictures/Photos Library.photoslibrary/originals/"
cd "$src"
find ./ -cmin +1440 -print0 |
  rsync --files-from=- --from0 \
    "${args[@]}" \
    "./" \
    nas.home:/var/services/homes/jyc/Photos/iCloudViaMac

I added find recently because it’s annoying to accidentally backup temporary photos, like screenshots, that only live in my iPhone’s Camera Roll for a minute or so before I delete them.

I have launchd run that script daily using a configuration plist at ~/Library/LaunchAgents/jyc.copy-icloud-photos.service:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Disabled</key>
  <false/>
  <key>Label</key>
  <string>copy-icloud-photos</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/local/bin/fdautil</string>
    <string>exec</string>
    <string>/Users/jyc/bin/copy-icloud-photos</string>
  </array>
  <key>StandardErrorPath</key>
  <string>/tmp/copy-icloud-photos.err</string>
  <key>StandardOutPath</key>
  <string>/tmp/copy-icloud-photos.out</string>
  <key>StartInterval</key>
  <integer>86400</integer>
</dict>
</plist>

I set it up via LaunchControl, which is a third-party shareware GUI for launchd that also provides the fdautil wrapper script that makes it possible for the copy-icloud-photos script to have full disk access. I think it’s possible to get this to work without LaunchControl but I haven’t tried.

Unfortunately a big caveat is that this will back up recently deleted photos until they are truly deleted by iCloud. Here’s some lists filenames of non-deleted non-hidden photos under ~/Pictures/Photos Library.photoslibrary/originals/ when run on the database at ../Photos.sqlite:

select substr(ZFILENAME, 1, 1) || '/' || ZFILENAME
from ZASSET
where ZTRASHEDSTATE = 1 and ZHIDDEN = 0;

… but even when I grant bash, copy-icloud-photos, and sqlite3 Full Disk Access in System Settings > Privacy & Security, I can’t get it to work. I thought I might just need to grant my script Photos access as well, but that doesn’t work. Maybe Apple really is trying to block all programmatic access except through PhotoKit.

I am Culgi, who has been chosen by Inana for his attractiveness.

Because I am a powerful man who enjoys using his thighs, I, Culgi, the mighty king, superior to all, strengthened the roads, put in order the highways of the Land.

So that my name should be established for distant days and never fall into oblivion, so that my praise should be uttered throughout the Land, and my glory should be proclaimed in the foreign lands, I, the fast runner, summoned my strength and, to prove my speed, my heart prompted me to make a return journey from Nibru to brick-built Urim as if it were only the distance of a double-hour.
A praise poem of Shulgi (Shulgi A)

Sumerians didn’t skip leg day or cardio.

Notes on "Efficient Natural Language Response Suggestion for Smart Reply" by Henderson et al.

Previously: One-Paragraph Reviews, Vol. I

I didn’t manage to stick to the one-paragraph format this time. I’m trying to write down:

  1. everything that was novel and notable to me when reading the paper
  2. as concisely as possible

… but (1) can be a lot of stuff because the things I’m reading about are generally things on which I’m not an expert! I’ll try moving stuff that isn’t related to the main point into footnotes to cheat. If the trend continues, though, I’ll have to think of how to make things more concise…

“Efficient Natural Language Response Suggestion for Smart Reply” is a paper by Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-hsuan Sung, Laszlo Lukacs, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, and Ray Kurzweil (2017) on the algorithm behind Google’s pre-LLM1 “Smart Reply” feature, which suggests short replies like “I think it’s fine” or “It needs some work.”

The authors train a model composed of two neural network “towers”, one for the input email and one for the reply: each takes a vector representing an email, encoded as the sum2 of the n-gram embeddings of its words. The model learns to computes two vectors, $h_x$ for input emails and $h_y$ for response emails, such that $P(y|x) = h_x \cdot h_y$ is the probability that an email $y$ is the reply to an email $x$.

There are a few post-processing steps:

  1. adding $\alpha \log P_{\text{LM}}(y)$ to the score, where $\alpha$ is an arbitrary constant and $P_{\text{LM}}$ is computed by a language model, because the learned score $h_x \cdot h_y$ is biased towards “specific and long responses instead of short and generic ones;”
  2. a “diversification” stage where responses are clustered, to “omit redundant suggestions… and ensure a negative suggestion is given if the other two are affirmative and vice-versa”; and
  3. instead of computing full dot products when searching for appropriate response emails, computing smaller quantized3 dot products.

These days, you might use someone else’s text embedding model for $h_x$ and $h_y$, but you’d still need the post-processing steps; you would also need some transformation from input vectors to reply vectors so that $h_x \cdot h_y$ represents “$h_y$ is a reply to $h_x$” rather than just “$h_y$ is similar to $h_x$.” I wonder if LLMs might become cheap enough that $P_{\text{LM}}$ becomes all you need, similar to how spellchecking used to be an engineering feat but is now “3-5 lines of Python.”

1

Seq2Seq, the direct ancestor of the current generation of GPT-style LLMs, already existed at the time, but the authors wanted something more efficient.

2

C.f. the sinusoidal or learned positional encoding used in many current models. Sinusoidal positional encodings have a vaguely geometric interpretation: a word/token at a given position in a sentence is the token’s embedding vector with a translation applied, such that the distance between the translation applied to tokens at two positions is “symmetrical and decays nicely with time”.

3

They learn a “hierarchical quantization” for each vector such that $h_y \approx \text{VQ}(h_y) + \text{R}^T \text{PQ}(r_y)$, where $\text{VQ}$ is a vector quantization, $\text{R}$ is a rotation, and $\text{PQ}$ is a product quantization (the Cartesian product of $\mathcal K$ independent vector quantizers). Vector quantization just means expressing a $a$-dimensional vector as a linear combination of $b$ other vectors (the “codebook”); compression comes from $b < a$. It feels vaguely reminiscent of $k$-means, which predicts the output for a given input using the $k$ nearest input vectors.

Elixir map iteration order is very undefined

The iteration order for Elixir maps is not just “undefined” in the sense that there is some order at runtime which you don’t know. Different functions that take maps can also iterate over the map in different orders!

Lists have the iteration order you’d expect:

range = 1..32
Enum.map(range, fn a -> a end)
Enum.zip_with(range, range, fn a, b -> {a, b} end)
# [1, 2, 3, ...]
# [{1, 1}, {2, 2}, {3, 3}, ...]

… and so do maps with 32 or fewer entries:

range = 1..32
map = Enum.map(range, &{&1, true}) |> Enum.into(%{})
IO.inspect(Enum.map(map, fn {k, _v} -> k end))
IO.inspect(Enum.zip_with(range, map, fn _, {k, _v} -> k end))
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
#   22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
#   22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]

… but add one entry to a map and the pattern breaks:

range = 1..33
# ...
# [4, 25, 8, 1, 23, 10, 7, 9, 11, 12, 28, 24, 13, 3, 18, 29, 26, 22, 19, 2, 33,
#   21, 32, 20, 17, 30, 14, 5, 6, 27, 16, 31, 15]
# [15, 31, 16, 27, 6, 5, 14, 30, 17, 20, 32, 21, 33, 2, 19, 22, 26, 29, 18, 3,
#   13, 24, 28, 12, 11, 9, 7, 10, 23, 1, 8, 25, 4]

Enum.zip_with happens to enumerate over the entries of a map in opposite order from Enum.map!

I think it’s especially funny that this behavior only manifests for maps with more than 32 elements! It reminds me of this plotline (no spoilers) from Cixin Liu’s mind-blowing Remembrance of Earth’s Past trilogy:

 “These high-energy particle accelerators raised the amount of energy available for colliding particles by an order of magnitude, to a level never before achieved by the human race. Yet, with the new equipment, the same particles, the same energy levels, and the same experimental parameters would yield different results. Not only the results would vary if different accelerators were used, but even with the same accelerator, experiments performed at different times would give different results. Physicists panicked. …”

 “What does this mean? Wang asked. …

 “It means that the laws of physics are not invariant across time and space.”

On a less dramatic note, it reminds me of the Borwein integrals discovered by David Borwein and Jonathan Borwein in 2001:

$$ \int_0^\infty \frac{\sin(x)}{x} dx = \frac{\pi}{2} $$ $$ \int_0^\infty \frac{\sin(x)}{x} \frac{\sin(x/3)}{x/3} dx = \frac{\pi}{2} $$ $$ \int_0^\infty \frac{\sin(x)}{x} \frac{\sin(x/3)}{x/3} \cdots \frac{\sin(x/13)}{x/13} dx = \frac{\pi}{2} $$ $$ \int_0^\infty \frac{\sin(x)}{x} \frac{\sin(x/3)}{x/3} \cdots \frac{\sin(x/15)}{x/15} dx = \frac{\pi}{2} - 2.32 \times 10^{-11} $$

It’s interesting to think about the different kinds of behavior which you can’t know ahead-of-time. Suppose I roll some dice inside of a closed box.

  1. Non-deterministic but fixed. When I open the box, I see the dice have some value which I couldn’t predict, but which is the same regardless of how I open the box.
  2. Not fixed. After I’ve opened the box, every time I look at the dice, their values have changed.
  3. Depending on how I open the box, the dice have different values.

Fun

is the card of the week.

I'm computing the week number by dividing the number of days we are into the year by 7. This gives a different week number from ISO 8601. Suits are ordered diamonds, clubs, hearts, spades (like Big Two, unlike Poker) so that red and black alternate. On leap years there are 366 days in the year; the card for the 366th day is the white joker. Karl Palmen has proposed a different encoding.