Undergrad · Nepal · Chemistry & Code
A pharmacy undergrad bridging physics, biology, chemistry, and computation. I build things that probably shouldn't work then benchmark them until they do.
Dense retrieval models have a hard theoretical ceiling where their embedding dimension limits how many document combinations they can represent. DeepMind's 2025 LIMIT benchmark proved this formally, and state-of-the-art 7B parameter models like GritLM and E5-Mistral failed catastrophically on it, scoring below 13% Recall@100.
Numen bypasses this entirely with Character N-Gram Hashing, a training-free, vocabulary-free approach that maps text into vector spaces of arbitrary size using deterministic CRC32 hashing of 3, 4, and 5-grams. No pretraining. No fine-tuning. Scale the dimension to 32768 and it simply works, outperforming everything including BM25 and making it the first dense retrieval model to beat sparse keyword search on the LIMIT benchmark.
| Model | Type | Dim | Recall@100 |
|---|---|---|---|
| Numen | Dense | 32768 | 93.90% 🏆 |
| BM25 | Sparse | ~50k | 93.6% |
| Numen | Dense | 16384 | 93.05% |
| Numen | Dense | 8192 | 89.85% |
| Promptriever | Dense | 4096 | 18.9% |
| GritLM 7B | Dense | 4096 | 12.9% |
| E5-Mistral 7B | Dense | 4096 | 8.3% |
Structural Chemical Representation In Plain Text (SCRIPT) is a deterministic, sovereign molecular notation system governed by a formal language grammar and a high-performance parsing engine. Inspired by the algebraic recursion of Paninian grammar, SCRIPT derives the entire complexity of chemical space from a minimal set of generative axioms, replacing the non-deterministic heuristics of SMILES with a mathematically rigorous engine that ensures absolute state consistency and 100% round-trip fidelity.
Spanning the full spectrum from the simplest molecules like methane to the high-complexity scaffolds of materials science research, SCRIPT provides a unified representation from macroscopic crystallography to quantum spin states. It is an RDKit-independent sovereign engine built from first principles to provide a mathematically stable "one true string" for every molecule, reaction, and material.
In drug formulation, determining whether an active pharmaceutical ingredient (API) is chemically compatible with its excipients, including the inert fillers, binders, and coatings that make a tablet, typically requires months of stability testing and expensive lab work. Incompatibilities discovered late in development cost millions and can kill drug candidates entirely.
Keybox approaches this entirely in silico using a physics-based voxel platform grounded in 11-channel field theory and first-principles mechanics. Each voxel encodes the local chemical environment across 11 physical channels, such as electrostatics, van der Waals interactions, hydrogen bonding, hydrophobicity, and more. The platform simulates how API and excipient fields interact at the molecular interface to predict incompatibility before a single experiment is run.
Since 1997, distributed load balancing has been trapped by an impossible triangle: you can have fast O(1) lookups (Maglev), low rehash churn (Ring Hashing), or spatial locality (Geo-Hashing), but never all three at once. Gradient Hashing breaks this constraint by replacing static permutation math with a physics-based potential field equation modeled after how mycelial fungi route nutrients.
The inspiration: in 2010, researchers showed that slime mould independently recreated Tokyo's entire rail network by minimizing energy flow. Gradient Hashing asks the same question of server clusters, and gets the same elegant answer. Each routing decision is governed by gravity (distance pulls traffic to nearby nodes), pressure (load pushes traffic away from saturated nodes), and trust (a multiplicative filter that instantly isolates Byzantine nodes). The result is a liquid system: traffic flows to the optimal server and naturally spills to physical neighbors under load.
| Metric | Ring Hashing | Google Maglev | Gradient |
|---|---|---|---|
| Lookup Speed | O(log N) | O(1) | O(1) |
| Throughput | 0.37M req/s | 0.43M req/s | 1.10M req/s |
| Avg. Distance | 0.425 | 0.425 | 0.041 |
| Byzantine Resilience | 94.8% fail | 94.9% fail | 100% immune |
Shannon's three pillars of cryptography (secrecy, authentication, and steganography) have never been achieved simultaneously in a single messaging system. Matryoshka Protocol is the first implementation to claim all three. Messages are hidden inside ordinary web traffic (JSON API responses, HTTP headers, EXIF metadata) with a mathematically proven detection probability approaching zero, making encrypted communication statistically indistinguishable from regular browsing.
At its core is a novel Fractal Group Ratchet : a group encryption algorithm with O(1) decryption complexity regardless of group size. Combined with Schnorr-based zero-knowledge proofs for self-healing session recovery, post-quantum Kyber-1024 + Dilithium hybrid cryptography, and a fully serverless P2P architecture with k-anonymity peer discovery. Ships as a Python library (on PyPI) and a Rust for performance.
| Feature | Signal | Matryoshka |
|---|---|---|
| Traffic Invisibility | Detectable | ε→0 (math proven) |
| Group Encryption | O(n) pairwise | O(1) Fractal Ratchet |
| Session Recovery | Full re-handshake | 3-message self-heal |
| Architecture | Centralized servers | Decentralized P2P |
| Quantum Resistance | Planned | Kyber-1024 + Dilithium |
| Message Speed | 50–100ms | ~25ms (Rust) |
I'm Sangeet. A pharmacy undergrad from Nepal who works somewhere between a chemistry lab and a Jupyter notebook and never quiet left either.
The work is hard to categorize because none of it was planned. I started with a curiosity and it whirlwinded. My work lives at the intersection of things that don't usually talk to each other: fungal nutrient transport and distributed hash tables, drug formulation science and field theory, retrieval theory and cryptography. I'm drawn to problems where the right analogy from one domain completely unlocks another.
I have an arXiv paper on information retrieval, a benchmarked load balancer inspired by slime mould, a pharmaceutical simulation platform, a molecular grammar for computational chemistry, and a cryptographic protocol that mathematically proves its own undetectability. Chemistry is messy. Code is precise. I like both.
PS: Sangeet's the name, a daft undergrad splashing through chemistry and code like a toddler—my titrations are a mess, and I've used my mouth to pipette.