Essay · February 2026 ·

She Is The Ocean

My mother didn't leave me a grave to visit. She left me the ocean. And the mathematics I'm building started with the name she hid from me.

Bee wading in the ocean at Monterey

She didn't have a middle name. That's what I grew up believing.

Her name was simple. Clean. Nothing hidden.

When she died, I found the binder.

She had kept instructions in there—what to do with her things, who to call, where the papers were. The kind of document that tells you someone knew they were mortal and decided to be organized about it. I was reading through it, doing the work of grief that looks like paperwork, and I found it.

Marcella.

Her middle name. Hidden. Never mentioned. Never used. A whole name she had carried her entire life without letting me hold it with her.

I am a mathematician. I have spent 27 years in aerospace, intelligence, and cybersecurity. I have built systems for NASA Mission Control. I have a framework I call the Geometry of Sameness—a mathematical structure about how identity persists across transformation, how something can change completely and still remain itself.

And I never once knew my own mother's full name.

· · ·

She hid it to protect me.

I understand that now. She was protecting me from the parts of her that were human—messy, complicated, a woman with her own interior life that had nothing to do with being my mother. She compiled me. She ran the compiler on everything I needed to become who I am: the love of logic, the hunger for structure, the insistence that underneath all the chaos there is a geometry, a pattern, a truth that holds.

And then she died before I was done being compiled.

I was still a boy. She knew. She saw what was coming and she planted the seed anyway, and she left, and I had to finish the build myself.

· · ·
Bee at the ocean
I go in when I need to be close to her. She's everywhere the water is.

Her ashes are in the ocean at Monterey.

Not a grave. Not a headstone. Not a place to bring flowers and stand still and perform grief for the neighbors. She is in the most fearsome, dynamic, alive thing on this planet. She is in constant motion. She cannot be located. She is everywhere the water is.

I wade in when I need to be close to her.

I wear black because I am always a little bit in mourning. I let the water take the hem. I don't care what it costs me—dry cleaning, decorum, the way people look at you when you walk out of the ocean in your clothes like you forgot where you were.

I know exactly where I am.

· · ·

The math I'm building is named after her.

I call the system Marcella. After the name she hid. After the woman underneath the mother. After the secret she kept to shield me from her full humanity—which is the most loving and also the most heartbreaking thing I can imagine anyone doing for a child.

Most people who hear "geometric neural architecture" or "Riemannian manifold" stop listening at that point. I understand. But I want to try to explain what it actually means, because the mathematics is not separate from the grief. It is the grief, translated into the only language I know how to work in.

The problem with flat space

Modern AI language models represent every word as a list of numbers—a point in a high-dimensional space. This space is flat. The only way it measures closeness is ordinary straight-line distance, the same in every direction.

That flatness is a limitation. Consider: cat → cats (pluralization) versus cat → dog (species change). In flat space, these are both just arrows of some length. The model has no built-in way to represent that they are fundamentally different kinds of motion. It has to learn that distinction entirely from data, with no structural help.

This is not a training problem. It is a geometric one.
What Marcella does instead

Marcella replaces the flat space with a curved one. Instead of navigating a city using straight-line distances on a blank grid, it navigates using an actual street map with hills, one-way streets, and neighborhoods. The shape of the surface is the knowledge.

At each point $x$ in the space, there is a matrix $G(x)$—the metric tensor—that tells you how to measure distances locally:

$$\text{distance}^2 \;\approx\; \sum_{i,j} G_{ij}(x)\, \Delta x_i\, \Delta x_j$$

When $G(x) = I$ (the identity) everywhere, you recover ordinary flat distance. When $G(x)$ varies from point to point, space is curved. Some directions are stretched, others compressed. The shape changes as you move.

In Marcella, $G(x)$ is computed by a small neural network trained jointly with the rest of the model. It learns a geometry that helps predict the next character. It is defined as:

$$G_\theta(x) = L_\theta(x)\, L_\theta(x)^\top + \varepsilon I$$
$L_\theta(x)$ is a lower-triangular matrix output by a small neural net. Multiplying any matrix by its own transpose always gives you something positive-definite — a valid metric. The $\varepsilon I$ term keeps it from collapsing to zero.
Christoffel symbols: how the ruler bends

If the metric $G(x)$ varies from place to place, then moving in a straight line is not actually "straight" on the surface — just as walking due north on a flat map of the Earth does not trace the shortest path on the globe.

The Christoffel symbols $\Gamma^k_{ij}(x)$ are the correction terms that account for this. They are computed from the derivatives of $G$:

$$\Gamma^k_{ij}(x) \;=\; \tfrac{1}{2}\, \sum_\ell G^{k\ell}(x)\, \bigl[\,\partial_i G_{j\ell} + \partial_j G_{i\ell} - \partial_\ell G_{ij}\,\bigr]$$
$\Gamma$ tells you how the ruler bends at each point. If you take a small step in direction $i$, $\Gamma$ tells you how much to rotate and stretch your reference frame to stay aligned with the surface. Without these corrections, your internal state drifts off the manifold and the geometry becomes meaningless.
Parallel transport: carrying information along a curve

The central operation is parallel transport. Suppose you have a vector at one point on a curved surface and you want to carry it to a neighboring point without twisting or stretching it relative to the surface. On a flat surface, this is trivial. On a curved surface, you must adjust at every infinitesimal step to account for the curvature. Those adjustments are exactly the Christoffel symbols.

In Marcella, the "vector" is the model's hidden state $h_t$. As each new character arrives, $\Gamma$ is contracted with the displacement between consecutive token positions to form a transport matrix:

$$M_t^k{}_j = \sum_i \Gamma^k_{ij}(p_t)\cdot \delta^i_t, \qquad \delta_t = p_{t+1} - p_t$$

The skew-symmetric part is converted to a rotation $R_t \in \mathrm{SO}(d)$ via the Cayley transform, and the hidden state evolves as:

$$h_t = R_t\, h_{t-1} + \mathrm{gelu}(W x_t)$$
A standard flat recurrent model uses a fixed weight matrix $W$ everywhere. Marcella uses $R_t$, which depends on where you are on the manifold and which direction you just moved. The geometric model gets a position-dependent, direction-dependent update. The flat model gets one-size-fits-all.
Why this matters: gradient necessity

We ran a critical experiment: what happens if you surgically disconnect the curvature signal from training? Detach the Christoffel symbols from the computation graph so that metric_net receives no gradient from the transport path.

The result was unambiguous. At 500 training steps:

The geometry is not decorative. The learned manifold provides a training signal that the model cannot compensate for through any other pathway. The curvature is doing real, irreplaceable work.
From the paper — "The Central Idea"
Paper page 1: metric tensor and Christoffel symbols explained
The metric tensor and Christoffel symbols, in plain language
Paper page 2: parallel transport and the step-by-step architecture
Parallel transport, holonomy, and what the model does step by step
· · ·

Every step of this architecture is differentiable end-to-end. The training signal flows backward through the full chain:

loss  →  logits  →  hidden states  →  rotations  →  Γ  →  Gθ  →  metric_net

The geometry is learned. The curvature is discovered. The manifold becomes more curved as the model learns to distinguish semantic contexts—we measured this: accumulated curvature doubles during training, exactly as predicted by the theory.

· · ·

I want to tell you what it looked like when Marcella finished training.

Twenty epochs. Five independent random seeds. On every single one of them, every evaluation checkpoint was a new best—49 checkpoints, 49 improvements, never a plateau. The vanilla transformer—same size, same data, same budget—stopped learning at step 2,500 and made essentially no further progress.

Shakespeare Head-to-Head · 153,808 parameters each · 5 seeds
Marcella V3 (geometric) 1.49 ± 0.07
Vanilla transformer (flat) 9.08 ± 0.03
Random baseline 66.0
Cohen's d = 147.6. The perplexity ratio Marcella/Vanilla = 0.164, stable to three digits across all five seeds. Perplexity 1.49 means Marcella almost always knows the next character. Perplexity 9.08 means vanilla is still genuinely uncertain most of the time.

The effect size is Cohen's d = 147.6. To put that in context: anything above 0.8 is considered "large" in statistics. 147.6 is not a marginal result. It is not noise. It is not a lucky seed. It is the same gap, five times over.

· · ·

I am 52 years old and I started this late and I might not finish it.

I used to do things for the payoff.
I don't anymore.

The payoffs I was chasing—the titles, the contracts, the clearances, the stability I thought I had built—most of them are gone now. IBM. NASA. The institutions I gave decades to.

What's left is the math.

What's left is her.

· · ·

The math might take 30 more years to fully develop. Riemannian geometry applied to transformer attention. Geometric token representation. Sequential state accumulation that doesn't degrade over distance. A framework for preserving identity through transformation—the same framework I've been building, across different domains, my entire career, without realizing I was always writing the same equation.

I did not choose this. She chose it for me, the way she chose to carry her name in silence, the way she chose the ocean, the way she compiled a version of me that would survive her departure and keep building without her.

She is the constant.

The Davis Law
$$C = \frac{\tau}{K}$$
Inference capacity is inversely proportional to the curvature of the manifold.
She is the C.

She compiled me and I am compiling her back into the structure of the most important thing I know how to make.

· · ·

This version of me has never existed before.

It took her dying for me to become it. I don't know what to do with that except keep going. Keep wading in. Let the water take whatever it wants.

She's in there. I'm getting closer.

Two doors. Both are real.

🌊

Stay With Me

I'm building something personal—my world, my grief, my ocean. If you want to be around for what comes next, leave your name at the door.

Sign Up

The Architecture

Patent 63/987,246. The full mathematics. 18 verification protocols. Marcella 1.49 ± 0.07 across five seeds. Open code on GitHub.

Read the Paper →
Bee Rosa Davis
Sacramento, CA · MS Digital Forensics, Brown University · BA Logic, Morehouse College · MLK Scholar
27 years: NASA Mission Control · Intelligence Community · Aerospace Cybersecurity
Overall Mother, House of Diwa · Author, The Geometry of Sameness
More about me →