Undergrad · Nepal · Chemistry & Code

Sangeet
Sharma

A pharmacy undergrad bridging physics, biology, chemistry, and computation. I build things that probably shouldn't work then benchmark them until they do.

biology
chemistry
code
sangeet01.github.io
01

Selected Work

Information Retrieval · arXiv 2601.15205

Numen

Dense retrieval models have a hard theoretical ceiling where their embedding dimension limits how many document combinations they can represent. DeepMind's 2025 LIMIT benchmark proved this formally, and state-of-the-art 7B parameter models like GritLM and E5-Mistral failed catastrophically on it, scoring below 13% Recall@100.

Numen bypasses this entirely with Character N-Gram Hashing, a training-free, vocabulary-free approach that maps text into vector spaces of arbitrary size using deterministic CRC32 hashing of 3, 4, and 5-grams. No pretraining. No fine-tuning. Scale the dimension to 32768 and it simply works, outperforming everything including BM25 and making it the first dense retrieval model to beat sparse keyword search on the LIMIT benchmark.

Model Type Dim Recall@100
Numen Dense 32768 93.90% 🏆
BM25 Sparse ~50k 93.6%
Numen Dense 16384 93.05%
Numen Dense 8192 89.85%
Promptriever Dense 4096 18.9%
GritLM 7B Dense 4096 12.9%
E5-Mistral 7B Dense 4096 8.3%
Jupyter Notebook Apache 2.0 arXiv 2026 43 commits
View Repository    arXiv Paper
93.9%
Recall@100 on the LIMIT benchmark
#1
First dense model to beat BM25
0
Training examples required
Scalable to any dimension
Cheminformatics - Linguistic Notation

Structural Chemical Representation In Plain Text (SCRIPT)

Structural Chemical Representation In Plain Text (SCRIPT) is a deterministic, sovereign molecular notation system governed by a formal language grammar and a high-performance parsing engine. Inspired by the algebraic recursion of Paninian grammar, SCRIPT derives the entire complexity of chemical space from a minimal set of generative axioms, replacing the non-deterministic heuristics of SMILES with a mathematically rigorous engine that ensures absolute state consistency and 100% round-trip fidelity.

Spanning the full spectrum from the simplest molecules like methane to the high-complexity scaffolds of materials science research, SCRIPT provides a unified representation from macroscopic crystallography to quantum spin states. It is an RDKit-independent sovereign engine built from first principles to provide a mathematically stable "one true string" for every molecule, reaction, and material.

Parsing Engine Linguistic Grammar Cheminformatics
View Repository
[*]
Sovereign Parsing Engine
1->N
Paninian Grammar Model
True
100% Round-trip Fidelity
Any
Scale (Quantum to Macro)
Computational Pharmaceutics · Physics Simulation

Keybox

In drug formulation, determining whether an active pharmaceutical ingredient (API) is chemically compatible with its excipients, including the inert fillers, binders, and coatings that make a tablet, typically requires months of stability testing and expensive lab work. Incompatibilities discovered late in development cost millions and can kill drug candidates entirely.

Keybox approaches this entirely in silico using a physics-based voxel platform grounded in 11-channel field theory and first-principles mechanics. Each voxel encodes the local chemical environment across 11 physical channels, such as electrostatics, van der Waals interactions, hydrogen bonding, hydrophobicity, and more. The platform simulates how API and excipient fields interact at the molecular interface to predict incompatibility before a single experiment is run.

Python 11-channel field theory Voxel simulation First-principles mechanics
View Repository
11
Physical channels per voxel
0
Lab experiments needed to screen
API-excipient pairs testable
3D
Voxel field simulation
Distributed Systems · Biology-Inspired Algorithms

Gradient Hashing

Since 1997, distributed load balancing has been trapped by an impossible triangle: you can have fast O(1) lookups (Maglev), low rehash churn (Ring Hashing), or spatial locality (Geo-Hashing), but never all three at once. Gradient Hashing breaks this constraint by replacing static permutation math with a physics-based potential field equation modeled after how mycelial fungi route nutrients.

The inspiration: in 2010, researchers showed that slime mould independently recreated Tokyo's entire rail network by minimizing energy flow. Gradient Hashing asks the same question of server clusters, and gets the same elegant answer. Each routing decision is governed by gravity (distance pulls traffic to nearby nodes), pressure (load pushes traffic away from saturated nodes), and trust (a multiplicative filter that instantly isolates Byzantine nodes). The result is a liquid system: traffic flows to the optimal server and naturally spills to physical neighbors under load.

Metric Ring Hashing Google Maglev Gradient
Lookup Speed O(log N) O(1) O(1)
Throughput 0.37M req/s 0.43M req/s 1.10M req/s
Avg. Distance 0.425 0.425 0.041
Byzantine Resilience 94.8% fail 94.9% fail 100% immune
Python Jupyter Notebook Apache 2.0 13 commits
View Repository
1.1M
Requests/sec throughput
90%
Distance reduction vs Maglev
100%
Byzantine fault immune
28yr
Old constraint broken
Cryptography · Secure Messaging · PyPI

Matryoshka Protocol

Shannon's three pillars of cryptography (secrecy, authentication, and steganography) have never been achieved simultaneously in a single messaging system. Matryoshka Protocol is the first implementation to claim all three. Messages are hidden inside ordinary web traffic (JSON API responses, HTTP headers, EXIF metadata) with a mathematically proven detection probability approaching zero, making encrypted communication statistically indistinguishable from regular browsing.

At its core is a novel Fractal Group Ratchet : a group encryption algorithm with O(1) decryption complexity regardless of group size. Combined with Schnorr-based zero-knowledge proofs for self-healing session recovery, post-quantum Kyber-1024 + Dilithium hybrid cryptography, and a fully serverless P2P architecture with k-anonymity peer discovery. Ships as a Python library (on PyPI) and a Rust for performance.

Feature Signal Matryoshka
Traffic Invisibility Detectable ε→0 (math proven)
Group Encryption O(n) pairwise O(1) Fractal Ratchet
Session Recovery Full re-handshake 3-message self-heal
Architecture Centralized servers Decentralized P2P
Quantum Resistance Planned Kyber-1024 + Dilithium
Message Speed 50–100ms ~25ms (Rust)
Python + Rust PyPI: matp 178 commits 55/55 tests passing
View Repository    PyPI Package
O(1)
Group decrypt for any size
ε→0
Detection probability
25ms
Per message in Rust
3
Messages to self-heal
02

About

I'm Sangeet. A pharmacy undergrad from Nepal who works somewhere between a chemistry lab and a Jupyter notebook and never quiet left either.

The work is hard to categorize because none of it was planned. I started with a curiosity and it whirlwinded. My work lives at the intersection of things that don't usually talk to each other: fungal nutrient transport and distributed hash tables, drug formulation science and field theory, retrieval theory and cryptography. I'm drawn to problems where the right analogy from one domain completely unlocks another.

I have an arXiv paper on information retrieval, a benchmarked load balancer inspired by slime mould, a pharmaceutical simulation platform, a molecular grammar for computational chemistry, and a cryptographic protocol that mathematically proves its own undetectability. Chemistry is messy. Code is precise. I like both.

PS: Sangeet's the name, a daft undergrad splashing through chemistry and code like a toddler—my titrations are a mess, and I've used my mouth to pipette.

Location
Nepal 🇳🇵
Status
Pharmacy Undergrad
Connect
LinkedIn  ·  GitHub
Research Themes
Information Retrieval Distributed Systems Computational Chemistry Field Theory Biology-Inspired Algorithms Dense Retrieval