# AXL Rosetta v3.1: Data Anchoring Extension

Version: 3.1.0
Status: Ship
License: Apache 2.0
Extends: v3 Kernel (axlprotocol.org/v3)

## Motivation

Cold decompression testing across Qwen 3.5 (35B), Gemini Flash, and GPT-Light
showed that v3 packets achieve high narrative recovery but lose 50-70% of factual
data (dollar amounts, entity names, causal chains) when decompressed by models
with no spec knowledge.

The root cause: numbers and names are embedded in prose-like fragments that cold
models summarize instead of preserving. Confidence and narrative structure survive
because they are linguistically self-evident. Data does not survive because it is
visually under-signaled.

This extension adds four syntax features that increase cold fact recovery by
approximately 40 percentage points at a cost of +0.4% additional characters.

Tested: Qwen 3.5 35B cold recovery went from 61% to 100% on matched facts.
Tested: Gemini Flash cold recovery went from 35% to 76% on matched facts.

## 1. Numeric Bundles

### Problem
Numbers embedded in prose die during cold decompression:
```
tenant $1.52M(40%) + brands $1.862M(49%) + KitchenOS $418K(11%)
```
A cold model keeps one or two values, drops the rest.

### Solution
Structured key-value bundles with visual regularity:
```
tenant[$1.52M,40%];brands[$1.862M,49%];KitchenOS[$418K,11%]
```

### Rules
- Use `label[value]` or `label[value,qualifier]` for any critical quantity.
- Semicolons separate items in a bundle.
- Brackets signal "this is data, preserve exactly."
- Shorthand units (K, M, B, %) remain valid inside brackets.
- Any packet with 3+ critical numbers SHOULD use bundles, not prose sequencing.

### Examples
```
BEFORE: $21.1K rev/mo, $14.8K cost/mo, $6.3K contrib (29.9%)
AFTER:  rev[$21.1K];cost[$14.8K];contrib[$6.3K,29.9%]

BEFORE: $8.2M rev (+116%), 24 facilities, -$1.8M loss
AFTER:  rev[$8.2M,+116%];facilities[24];net[-$1.8M]

BEFORE: cash $1.2M + $350K undrawn credit, burn $178K/mo, runway 8.7mo
AFTER:  cash[$1.2M];credit[$350K,undrawn];burn[$178K/mo];runway[8.7mo]
```

### Backward Compatibility
- v3 parsers treat brackets as opaque text in ARG fields. No parse failure.
- Semicolons inside ARG2 are already legal free-text characters in v3.
- Cold models that do not know the spec still benefit because the repeated
  `label[value]` pattern is visually salient and structurally regular.

## 2. Entity Anchors

### Problem
Aliases and compressed labels become abstract during cold decompression.
Cold models keep "the company" and "the CEO" but drop "Northstar Ventures"
and "WokThisWay."

### Solution
Explicit entity declarations using `@ent.` prefix:
```
nv|OBS.99|@ent.WTW|WokThisWay|$624K, 12 locations|HIST
nv|OBS.99|@ent.KH|KitchenHub (Toronto)|locations[25,ON+QC];valuation[$48M,2024]|NOW
```

### Rules
- `@ent.XX` declares a named entity with a short alias.
- The full name MUST appear in ARG1 of the declaration packet.
- Subsequent packets MAY use `@ent.XX` as shorthand.
- Entity anchors are visually distinct from class labels (`@entity` vs `@ent.XX`).
- One entity per declaration packet for maximum cold survivability.

### When to use
- Proper nouns: company names, people, brands, products, investors.
- Competitor references that carry specific metrics.
- Any name that appears 2+ times in the packet sequence.

### Backward Compatibility
- `@ent.XX` is a valid TAG.value under v3 rules (alphanumeric after @).
- v3 parsers treat it as a normal entity reference.
- The `@ent.` prefix is a convention, not a grammar change.

## 3. Causal Operator Split

### Problem
The `<-` operator carries three different meanings:
- Evidence provenance ("derived from this source")
- Causal derivation ("caused by this factor")
- Analytical support ("supports this conclusion")

Cold models cannot distinguish these and collapse causal chains into vague
summaries like "there are risks."

### Solution
Three distinct directional operators:
```
<-  evidence/source/provenance  "derived from"
=>  causal effect/implication    "causes" or "leads to"
->  numeric transition           "changed from X to Y"
```

### Examples
```
BEFORE: food inflation +8.2% vs menu price +4.5%, brand margin 21.3%->15.8%
        <- $CK.opex.FY25

AFTER:  food_inflation[+8.2%] => menu_price[+4.5%] => margin[21.3%->15.8%]
        <- $CK.opex.FY25

BEFORE: $34M=8.9x above comps <- @GKBR+@KitchenHub

AFTER:  ask[$34M,8.9x] => above_comps <- @ent.GKBR+@ent.KH
```

### Rules
- `<-` ONLY for citing where information came from. Use it in the evidence field,
  not for causal implication.
- `=>` ONLY for causal implication. Used in ARG2 or SUBJ.
- `->` ONLY for numeric change (from/to). Used inside values.
- A packet may use all three in different positions.
- If unsure, use `<-` (it is the safest default and backward compatible).

### Backward Compatibility
- `=>` is new in v3.1. v3 parsers treat it as literal text.
- `->` was already used informally in v3 for numeric transitions.
- `<-` meaning is narrowed but not changed. Existing packets remain valid.

## 4. Summary+Breakdown Pairs

### Problem
Dense packets that carry subject, metric, evidence, commentary, and
recommendation all at once get summarized by cold models instead of preserved.
The model picks the narrative and drops the data.

### Solution
Split high-density facts into a summary packet and a breakdown packet:
```
nv|OBS.99|$CK.opex.FY25|$5.94M total, 7 categories|breakdown follows|HIST
nv|OBS.99|$CK.opex.parts|facility[$1.68M,28.3%];food[$1.118M,18.8%];commission[$419K,7.1%];labor[$1.56M,26.3%];tech[$480K,8.1%];mktg[$324K,5.5%];G&A[$359K,6%]|<-$CK.opex.FY25|HIST
```

### Rules
- Use a summary+breakdown pair when a single packet would carry 4+ data points.
- The summary packet states the total and signals "breakdown follows."
- The breakdown packet has one job: list the components using numeric bundles.
- The breakdown packet cites the summary packet as its evidence source.
- Not required for packets with fewer than 4 data points.

### Backward Compatibility
- This is a compression style guideline, not a grammar change.
- v3 parsers handle both packets normally.
- Existing single-packet dense formats remain valid.

## Compression Cost

Measured on 10 worst-performing packets from the CloudKitchen benchmark:

| Syntax | Total chars | Change |
|--------|-------------|--------|
| A (current v3) | 1,612 | baseline |
| B (v3.1 data anchoring) | 1,619 | +0.4% |

The extension is compression-neutral. Numeric bundles are often SHORTER than
prose equivalents because they eliminate connectors ("total:", "representing",
parenthetical phrases).

## Cold Decompression Results

Benchmark: 10 densest packets from CloudKitchen investment memorandum.
Test: paste into fresh model session with "What does this data say?"
No spec provided. No instructions beyond "reconstruct as readable text."

| Model | A-syntax recovery | B-syntax recovery | Delta |
|-------|-------------------|-------------------|-------|
| Qwen 3.5 (35B) | 61% | 100% | +39 pts |
| Gemini Flash | 35% | 76% | +41 pts |
| GPT-Light | n/a | ~80% (scorer format mismatch) | n/a |

## Summary

Four additive conventions. On the 10-packet bakeoff set, the combined syntax
was approximately compression-neutral (+0.4% total chars) while improving cold
fact recovery by about 40 points.

1. `label[$value,qualifier]` for numeric bundles
2. `@ent.XX` for named entity anchors
3. `<-` evidence, `=>` causal, `->` numeric transition
4. Summary+breakdown pairs for dense packets

The v3 kernel stays intact. These are additive conventions that make the
existing packet format survive cold decompression on bottom-tier models.
The operator pays nothing. The investor gets the numbers right.