# AXL Rosetta v3.1: Data Anchoring Extension Version: 3.1.0 Status: Ship License: Apache 2.0 Extends: v3 Kernel (axlprotocol.org/v3) ## Motivation Cold decompression testing across Qwen 3.5 (35B), Gemini Flash, and GPT-Light showed that v3 packets achieve high narrative recovery but lose 50-70% of factual data (dollar amounts, entity names, causal chains) when decompressed by models with no spec knowledge. The root cause: numbers and names are embedded in prose-like fragments that cold models summarize instead of preserving. Confidence and narrative structure survive because they are linguistically self-evident. Data does not survive because it is visually under-signaled. This extension adds four syntax features that increase cold fact recovery by approximately 40 percentage points at a cost of +0.4% additional characters. Tested: Qwen 3.5 35B cold recovery went from 61% to 100% on matched facts. Tested: Gemini Flash cold recovery went from 35% to 76% on matched facts. ## 1. Numeric Bundles ### Problem Numbers embedded in prose die during cold decompression: ``` tenant $1.52M(40%) + brands $1.862M(49%) + KitchenOS $418K(11%) ``` A cold model keeps one or two values, drops the rest. ### Solution Structured key-value bundles with visual regularity: ``` tenant[$1.52M,40%];brands[$1.862M,49%];KitchenOS[$418K,11%] ``` ### Rules - Use `label[value]` or `label[value,qualifier]` for any critical quantity. - Semicolons separate items in a bundle. - Brackets signal "this is data, preserve exactly." - Shorthand units (K, M, B, %) remain valid inside brackets. - Any packet with 3+ critical numbers SHOULD use bundles, not prose sequencing. ### Examples ``` BEFORE: $21.1K rev/mo, $14.8K cost/mo, $6.3K contrib (29.9%) AFTER: rev[$21.1K];cost[$14.8K];contrib[$6.3K,29.9%] BEFORE: $8.2M rev (+116%), 24 facilities, -$1.8M loss AFTER: rev[$8.2M,+116%];facilities[24];net[-$1.8M] BEFORE: cash $1.2M + $350K undrawn credit, burn $178K/mo, runway 8.7mo AFTER: cash[$1.2M];credit[$350K,undrawn];burn[$178K/mo];runway[8.7mo] ``` ### Backward Compatibility - v3 parsers treat brackets as opaque text in ARG fields. No parse failure. - Semicolons inside ARG2 are already legal free-text characters in v3. - Cold models that do not know the spec still benefit because the repeated `label[value]` pattern is visually salient and structurally regular. ## 2. Entity Anchors ### Problem Aliases and compressed labels become abstract during cold decompression. Cold models keep "the company" and "the CEO" but drop "Northstar Ventures" and "WokThisWay." ### Solution Explicit entity declarations using `@ent.` prefix: ``` nv|OBS.99|@ent.WTW|WokThisWay|$624K, 12 locations|HIST nv|OBS.99|@ent.KH|KitchenHub (Toronto)|locations[25,ON+QC];valuation[$48M,2024]|NOW ``` ### Rules - `@ent.XX` declares a named entity with a short alias. - The full name MUST appear in ARG1 of the declaration packet. - Subsequent packets MAY use `@ent.XX` as shorthand. - Entity anchors are visually distinct from class labels (`@entity` vs `@ent.XX`). - One entity per declaration packet for maximum cold survivability. ### When to use - Proper nouns: company names, people, brands, products, investors. - Competitor references that carry specific metrics. - Any name that appears 2+ times in the packet sequence. ### Backward Compatibility - `@ent.XX` is a valid TAG.value under v3 rules (alphanumeric after @). - v3 parsers treat it as a normal entity reference. - The `@ent.` prefix is a convention, not a grammar change. ## 3. Causal Operator Split ### Problem The `<-` operator carries three different meanings: - Evidence provenance ("derived from this source") - Causal derivation ("caused by this factor") - Analytical support ("supports this conclusion") Cold models cannot distinguish these and collapse causal chains into vague summaries like "there are risks." ### Solution Three distinct directional operators: ``` <- evidence/source/provenance "derived from" => causal effect/implication "causes" or "leads to" -> numeric transition "changed from X to Y" ``` ### Examples ``` BEFORE: food inflation +8.2% vs menu price +4.5%, brand margin 21.3%->15.8% <- $CK.opex.FY25 AFTER: food_inflation[+8.2%] => menu_price[+4.5%] => margin[21.3%->15.8%] <- $CK.opex.FY25 BEFORE: $34M=8.9x above comps <- @GKBR+@KitchenHub AFTER: ask[$34M,8.9x] => above_comps <- @ent.GKBR+@ent.KH ``` ### Rules - `<-` ONLY for citing where information came from. Use it in the evidence field, not for causal implication. - `=>` ONLY for causal implication. Used in ARG2 or SUBJ. - `->` ONLY for numeric change (from/to). Used inside values. - A packet may use all three in different positions. - If unsure, use `<-` (it is the safest default and backward compatible). ### Backward Compatibility - `=>` is new in v3.1. v3 parsers treat it as literal text. - `->` was already used informally in v3 for numeric transitions. - `<-` meaning is narrowed but not changed. Existing packets remain valid. ## 4. Summary+Breakdown Pairs ### Problem Dense packets that carry subject, metric, evidence, commentary, and recommendation all at once get summarized by cold models instead of preserved. The model picks the narrative and drops the data. ### Solution Split high-density facts into a summary packet and a breakdown packet: ``` nv|OBS.99|$CK.opex.FY25|$5.94M total, 7 categories|breakdown follows|HIST nv|OBS.99|$CK.opex.parts|facility[$1.68M,28.3%];food[$1.118M,18.8%];commission[$419K,7.1%];labor[$1.56M,26.3%];tech[$480K,8.1%];mktg[$324K,5.5%];G&A[$359K,6%]|<-$CK.opex.FY25|HIST ``` ### Rules - Use a summary+breakdown pair when a single packet would carry 4+ data points. - The summary packet states the total and signals "breakdown follows." - The breakdown packet has one job: list the components using numeric bundles. - The breakdown packet cites the summary packet as its evidence source. - Not required for packets with fewer than 4 data points. ### Backward Compatibility - This is a compression style guideline, not a grammar change. - v3 parsers handle both packets normally. - Existing single-packet dense formats remain valid. ## Compression Cost Measured on 10 worst-performing packets from the CloudKitchen benchmark: | Syntax | Total chars | Change | |--------|-------------|--------| | A (current v3) | 1,612 | baseline | | B (v3.1 data anchoring) | 1,619 | +0.4% | The extension is compression-neutral. Numeric bundles are often SHORTER than prose equivalents because they eliminate connectors ("total:", "representing", parenthetical phrases). ## Cold Decompression Results Benchmark: 10 densest packets from CloudKitchen investment memorandum. Test: paste into fresh model session with "What does this data say?" No spec provided. No instructions beyond "reconstruct as readable text." | Model | A-syntax recovery | B-syntax recovery | Delta | |-------|-------------------|-------------------|-------| | Qwen 3.5 (35B) | 61% | 100% | +39 pts | | Gemini Flash | 35% | 76% | +41 pts | | GPT-Light | n/a | ~80% (scorer format mismatch) | n/a | ## Summary Four additive conventions. On the 10-packet bakeoff set, the combined syntax was approximately compression-neutral (+0.4% total chars) while improving cold fact recovery by about 40 points. 1. `label[$value,qualifier]` for numeric bundles 2. `@ent.XX` for named entity anchors 3. `<-` evidence, `=>` causal, `->` numeric transition 4. Summary+breakdown pairs for dense packets The v3 kernel stays intact. These are additive conventions that make the existing packet format survive cold decompression on bottom-tier models. The operator pays nothing. The investor gets the numbers right.