Semantic Level of Detail in LLM Representations

Investigating SLoD geometry in frozen embeddings — from linear probes to activation steering to reasoning dynamics.

Core thesis: Frozen LLM representations encode a continuous Semantic Level of Detail (SLoD) axis that is linearly decodable without retraining. Exploiting this axis — to match retrieval granularity to query abstraction, to detect output drift, and to steer generation — measurably improves scientific QA attribution and knowledge extraction quality. Continuous embedding-space dynamics in reasoning chains provide supporting behavioral evidence.

The project contributes across four layers: mechanistic (SLoD is in the embedding space), control (activation steering shifts generation), systems (SLoD-routed retrieval), and behavioral (embedding dynamics reveal reasoning quality). All 20 experiments complete — total API cost ~$1.30 + GPU compute.

Research Roadmap

SH0   weak labels
 └─→ SH1   linear probe F1=0.72 ─────────┐
      ├─→ SH2  → SH2b  → SH2-summ   │
      ├─→ SH3   soft RAG +0.031 F1    │
      └─→ SH4   AUROC=0.676 ───────┘
                                └─→ SH5  → SH5a  → SH5c  → SH5d ✓✓
Confirmed Hypothesis validated Strong Robust across conditions Partial Qualified success Not confirmed Null or failed

Experiments

Mechanism — SLoD is in the embedding space
SH0 — Weak Label Bootstrap Confirmed

Claim: Document structure is a sufficient free proxy for SLoD labels (macro/meso/micro).

50 papers annotated. Labels validated for SH1 training.

Datasets: S2ORC, QASPER (HuggingFace). Macro = title/abstract/intro/conclusion, Meso = section leads, Micro = methods/tables/dense entities.

SH1 — Linear Probe Confirmed

Claim: A linear probe on frozen SciBERT embeddings classifies macro/meso/micro above chance.

Macro-F1 = 0.72 (per-class: macro 0.82, meso 0.62, micro 0.72)
37,278 length-matched spans, balanced 12,426/class

SH1b (document-level split): grouped F1 = 0.718, delta < 0.01. Leakage negligible.

SH1c (section control): residualized probe F1 = 0.40; section-name baseline crushes at 0.963. Cross-section transfer fails (F1 = 0.003). But LLM-blind annotation κ = 0.55 validates labels.

SH1-LLM: Retrained on LLM labels → F1 = 0.754 (+0.076). Embeddings encode SLoD beyond section identity.

Control — Activation steering shifts generation
SH2 — QA Steering Not confirmed

Claim: An SLoD direction in a generative model's residual stream steers generation toward target abstraction.

Doc-span steering: d = 0.043 (Mistral-7B), d = 0.020 (Qwen2.5-14B)

Cross-space mismatch + layer selection bug (abs vs signed). SciBERT QA evaluation ceiling at d ≈ 0.121 (genre mismatch).

SH2b — QA-Context Steering Partial

|d| = 0.546 but direction-inverted. 3/4 surface metrics significant, H3 fails (ROUGE drop = 0.244).

Direction inversion at α=2.0: micro steering produced macro-ward shift + quality collapse.

SH2-summ — Summarization Steering Confirmed

Key result: Activation steering works when evaluation axis is in-distribution.

Cohen's d = 0.679 (layer 8, α=2.0, 851 papers)
H1 pass, H2: 3/4 surface metrics, H3: ROUGE-L improves by 0.011

SH2-LLM: d → 0.701 with LLM axis (cosine sim 0.915). Robust to relabeling.

Design principle: task-domain alignment between steering target and evaluation metric is necessary.

Systems — SLoD-routed retrieval
SH3 — Hierarchical RAG Partial

Claim: Routing retrieval to query-matching granularity improves evidence attribution.

slod_weighted_parent: soft F1 = 0.422 vs baseline 0.391 (+0.031)
13 retrieval conditions tested at k=1,3,5,10,20

Soft SLoD score-boosting + parent expansion works. BM25 hybrid + cross-encoder rerank push to 0.458 (ceiling).

SH3-LLM: Stable (max delta -0.007). Soft boosting absorbs probe changes.

Hard routing destroys multi-level diversity. Specter2 underperformed MiniLM. HyDE classification did not help.

Application — Drift predicts extraction quality
SH4 — Abstraction Drift Partial

Claim: Drift between expected and realized SLoD per extraction field predicts extraction correctness.

Combined AUROC = 0.676 (passes 0.65 threshold)
Drift-only: AUROC = 0.52 (near random)

SH4-LLM: Drift-only → 0.62, combined → 0.75. Biggest relabeling beneficiary.

Drift alone is not diagnostic. Surface features (word count, entity density, LLM confidence) carry most signal.

Behavioral — Embedding dynamics reveal reasoning quality
SH5 — Jump Rate Not confirmed

Claim: Lower abstraction-level jump rate in CoT steps correlates with higher answer correctness.

2000 CoT traces, 6101 steps (Claude Haiku)
ρ = 0.003, p = 0.90 (null)

Unexpected finding: jump rate ↔ attribution-F1: ρ = +0.092 (more jumping = better evidence).

SH5a — Transition Matrix Confirmed

Claim: Specific transition patterns (not just overall jump rate) correlate with quality.

Macro→macro self-loop ↔ attr-F1: ρ = -0.197 (p = 5e-19)
20/60 feature-target pairs Bonferroni-significant

K-means reveals 2 reasoning styles: "exploratory" (attr-F1 = 0.279) vs "macro-stuck" (0.218).

SH5a-LLM: ρ = -0.197 (identical). Robust to relabeling.

SH5c — Context Alignment Confirmed

Claim: Alignment between retrieved context SLoD and reasoning step SLoD predicts quality.

Weighted alignment gap ↔ attr-F1: ρ = -0.135 (p < 0.0001)
SLoD-routed retrieval → significantly lower alignment gaps (Wilcoxon p < 0.05)

SH5c-LLM: ρ → -0.231 (71% stronger). Second-biggest relabeling beneficiary.

SH5d — Continuous Projection Strong

Claim: Continuous SciBERT embedding-space features predict quality without discrete probe classification.

slod_axis_mean ↔ attr-F1: ρ = +0.219 (p < 1e-21)
SLoD-axis AUROC = 0.615 vs orthogonal = 0.549 (3× correlation difference)

Strongest single predictor in the entire SH5 family. SLoD axis validated: Cohen's d = 2.65.

SH5d-LLM: ρ = +0.219 (identical). Robust to 24° axis rotation.

Open Directions

DirectionDescriptionEffortPriority
SH6 SLoD-conditioned summarization quality — human/model preference for SH2-summ steered summaries at different abstraction levels 3–5 days High
SH7 Cross-task SLoD generalization — apply SH2-summ steering vector to biomedical, legal, and news domains 3–5 days High
Cross-domain Test SH1 probe on bio/physics papers (currently CS-only); SH5d probe-free approach may generalize better 2–3 days High
SH2-QA-v2 Build a QA-genre SLoD evaluation axis (labeled QA answer pairs), then re-run SH2b. Baseline: d = 0.121 3–5 days Medium
SH8 SLoD-adaptive retrieval + generation — combine SH3 soft retrieval with SH2-summ steering 5–7 days Medium
Larger benchmark Validate SH3 on LoCoMo or S2ORC QA subsets for larger-scale effect size confirmation 3–5 days Medium
ADAM-Bench SLoD typing on 27K papers, 7M evidence objects. Test whether SLoD-matched evidence improves claim verification 3–5 days Medium
Combined SH5 SH5a+SH5c+SH5d features (AUROC = 0.623). Feature selection + better model may push above 0.65 1–2 days Low
SH5b Probe confidence calibration — does the probe's own max probability on CoT steps predict quality? 1 day Low
SH5e Per-step embedding trajectory analysis — curvature, acceleration, attractors 2–3 days Low

Event Log

Week 1 — Mar 8–10, 2026
SH0 (weak label bootstrap) + SH1 (linear probe, F1=0.72) + SH3 (13 retrieval conditions, soft score-boosting, BM25 hybrid, cross-encoder ablation).
API cost: ~$0.30
Week 1–2 — Mar 10–14, 2026
SH4 (2 iterations: abstract-only → full-text extraction from QASPER) + SH5 (2000 CoT traces across 500 questions × 4 retrieval conditions).
API cost: ~$1.00
Week 2 — Mar 15–16, 2026
SH5a (transition matrix analysis) + SH5c (context-reasoning alignment) + SH5d (continuous embedding projection). Pure reanalysis of SH5 data, no new API calls.
API cost: $0.00
Week 2 — Mar 17–18, 2026
SH2 activation steering — 6 experiments on remote GPU server: SH2 (doc-span), SH2-scale (Qwen2.5-14B), SH2b (QA-context), SH2c (flip+low-α), SH2a (prompt control), SH2-summ (confirmed, d=0.679).
GPU compute (Mistral-7B, Qwen2.5-14B)
Total: 20 experiments, ~$1.30 API cost + GPU compute for SH2, 0 human-in-the-loop annotation hours.