Making MySQL AI-Ready: How MyVector and ProxySQL Work Together
As AI workloads become standard in modern applications, engineering teams face a familiar dilemma: MySQL is already the system of record, but vector search typically requires bolting on a separate database. That means two security models, two observability stacks, and inevitable data consistency headaches.
The Core Idea
MyVector is a MySQL plugin (targeting 8.0/8.4) that adds vector storage, distance functions, and HNSW indexing directly inside MySQL — without requiring MySQL’s native VECTOR type. HNSW (Hierarchical Navigable Small World) enables approximate nearest neighbor search at O(log N) speed, making it practical for millions of vectors.
ProxySQL sits in front and handles the hard operational problems: classifying queries as OLTP vs. vector workloads, routing them to separate hostgroups, enforcing concurrency limits, and triggering circuit breakers when vector bursts threaten to impact production latency.
Why This Matters
The primary use cases are RAG pipelines over documentation and knowledge bases, incident/runbook search, and code semantic search — workloads where your source of truth is already in MySQL and you don’t want to fragment your data stack.
The architecture keeps OLTP writes going to a primary hostgroup while vector similarity queries route to dedicated replicas with the MyVector plugin loaded. If vector traffic spikes and replica lag climbs, ProxySQL can automatically shed vector load before OLTP P99 is affected.
The RAG Ingest Pipeline
The presentation also covers a CLI tool (rag_ingest) that handles the full ingestion pipeline: fetching rows from backend MySQL sources incrementally using watermarks, chunking and transforming text, generating embeddings via any OpenAI-compatible API in configurable batches, and storing results back as vectors. This gives you a complete, observable pipeline without external orchestration infrastructure.
Migration Path
The proposed adoption path is pragmatic:
- Start with ProxySQL for routing and observability alone — no vectors yet
- Add vector columns with simple distance UDFs
- Graduate to HNSW indexing with dedicated vector replicas
- Operationalize freshness, backfills, and tenant-aware quality of service
The Honest Tradeoff
This is explicitly a WIP/concept — the authors are looking for design partner feedback to shape the roadmap. The approach is conservative by design: InnoDB stability comes first, vector capabilities are additive, and failure modes are meant to be boring. When retrieval breaks, OLTP stays healthy.
For teams heavily invested in MySQL who want AI retrieval without fragmenting their data operations, this is a compelling direction worth watching.
Resources
- MyVector GitHub repository
- proxysql-vec GitHub repository
- Presentation: Making MySQL AI-Ready: How MyVector and ProxySQL Work Together — René Cannaò, preFOSDEM MySQL Belgian Days 2026