The challenge with data search
The explosion in unstructured data, such as images, videos, sound records, and text, requires an effective solution for computer vision, voice recognition, and natural language processing. How to extract value from unstructured data poses as a big challenge for many enterprises.
AI, especially deep learning, has been proved as an effective solution. Vectorization of data features enables people to perform content-based search on unstructured data. For example, you can perform content-based image retrieval, including facial recognition and object detection, etc.
The AI search engine
Now the challenge turns into how to execute effectively search among billions of vectors. That’s what Milvus is designed for.
What is Milvus?
It is an open source distributed vector search engine that provides state-of-the-art similarity search and analysis of feature vectors and unstructured data. Some of its key features are:
- GPU-accelerated search engine – It is designed for the largest scale of vector index. CPU/GPU heterogeneous computing architecture allows you to process data at a speed 1000 times faster.
- Intelligent index – With a “Decide Your Own Algorithm” approach, you can embed machine learning and advanced algorithms into Milvus without the headache of complex data engineering or migrating data between disparate systems. Milvus is built on optimized indexing algorithm based on quantization indexing, tree-based and graph indexing methods.
- Strong scalability – The data is stored and computed on a distributed architecture. This lets you scale data sizes up and down without redesigning the system.
- High compatibility – Milvus is compatible with major AI/ML models and programming languages such as C++, Java and Python.
Billion-Scale similarity search
You may follow this link for step-by-step procedures to carry out performance test on 100 million vector search (SIFT1B). If you want, you can also try testing 1 billion. Here is the hardware requirements. Milvus has been open sourced lately. We greatly welcome contributors to join us in reinventing data science. See us on GitHub.