Latest works

Release | 7-1-2026
We achieved 20× compression and recovered ~86% MTEB performance. The model supports elastic upscaling at inference, avoiding full-weight loading and reducing both memory footprint and compute cost.

Release | 1-12-2025
This work explores model compression by progressively reducing a 28-layer model to a lean 6-layer model without major performance loss. Reduce and Refine demonstrates a practical path to faster, lighter, more efficient LLMs.
