LLM 2.0, RAG & Non-Standard Gen AI on GitHub

In this article, I share my latest Gen AI and LLM advances, featuring innovative approaches radically different from both standard AI and classical ML/NLP. The focus is on doing better with less, using efficient architectures, new algorithms and evaluation metrics. It originates from research that I started long ago. It gained significant momentum in the last two years. See background and history, here.

OpenAI, Perplexity, Anthropic, Llama and others typically follow the trend and implement solutions very similar to mines within 3 to 6 months after I publish new milestones. For instance, multi-tokens, knowledge graph tokens, multi-indexes, real-time fine-tuning, mixtures of experts, LLM routers, small enterprise sub-LLMs, prompt distillation, relevancy scoring engine, deep contextual retrieval, optimum agentic chunking, and modern UI instead of the basic prompt box. I keep adding new features all the time, staying ahead of competition.

Building the New LLM Paradigm

Traditional LLMs are hitting a wall, where getting bigger does not yield further improvements. Costly training via transformers and next-token prediction has become irrelevant to what modern LLMs do. Evaluation metrics fail to capture essential qualities. More on this in my recent article “There is no such thing as a trained LLM”, published on Data Science Central, here. In enterprise applications, issues come from poor integration, ignoring silos, lack of testing, poor augmentation techniques, and not incorporating user feedback: see here.

You can increase performance with zero parameter instead of billions, no GPU, and no prompt engineering. Using infinite context windows, avoiding hallucinations, while delivering better ROI, accuracy, explainability and security. This makes our technology particularly attractive to enterprise customers. Indeed, I first developed it for a Fortune 100 company. Yet, it is not possible to share it as a model on Hugging Face as it does not fit with the standard mold. And you won’t find it in traditional benchmark studies, as traditional evaluation metrics cannot handle the type of output that it produces.

The remaining of this article focuses on the most interesting open-source sections in my large GitHub repository. They cover most of the components. It also includes LLMs for predictions, cataloging, and clustering. Also, the best tabular data synthesizer and its universal evaluation metric, now available as a Python library.

LLM 2.0 for Enterprise

The most recent code is available here on GitHub. It includes the Nvidia case study, still under construction: deep PDF retrieval and contextual hierarchical chunking with multi-index and agents built post-crawling. Also featuring tables, images, diagrams, and bullet lists detection and processing. Contextual elements include tags and categories added post-crawling, as well as index keywords, font sizes, types, and colors. Also, you will find the code, anonymized input corpus, backend tables and output results for our first Fortune 100 case study. Some documentation is in the same folder. The most comprehensive description – including about the web API – is in my new book, here.

First version of xLLM

It started with the need to find references and answers to my research questions when writing articles and technical reports. Unsatisfied with OpenAI and other tools, I created my own LLM. I crawled the entire Wolfram corpus (15,000 URLs, 5000 categories) with home-made smart crawling. In the process, I retrieved and leveraged the full taxonomy. For a question about random walks – just to give an example – it returns results about 3D random walks, Wiener processes, and martingales, among others. All other tools return basic or irrelevant results, requiring multiple prompts to get anything useful. Often, peppered with subtle errors easy to overlook, in the math formulas that they generate. These errors are hard to detect, due to lack of proper referencing.

The whole Wolfram corpus, the code for smart crawling and taxonomy retrieval, backend tables and xLLM Python code, is available here and in the xLLM6 folder, here. It is mostly home-made, with limited reliance on external (faulty) Python libraries.

Data Synthetization

The story behind NoGAN, our tabular data synthesizer, is similar to xLLM. In short, the very painful training that comes with GAN, poor evaluation metrics, lack of reproducibility, and output quality that depends a lot on the features in your real data. Fixing GANs required so many add-ons that in the end, it worked thanks to these add-ons, not to GAN. The result is NoGAN, much faster with better synthetization. Also, the first implementation of the full multivariate Kolmogorov-Smirnov distance for model evaluation. Both are available here on GitHub.

Statistical Science Rewritten from Scratch

My original background and PhD thesis are in computational statistics and computer vision. However, in the last 20 years, I have completely rewritten the entire statistical corpus, barely using concepts such as maximum likelihood and random variables. I created a highly generic yet simplified regression covering all types, able to do clustering and working with no dependent variables. Gradient descent without math and with no learning rate, and new spatial interpolation for chaotic systems, are but a few of the many techniques that I introduced. You can find them on GitHub, here and here, as well as in my 7 books listed here.

About the Author

Towards Better GenAI: 5 Major Issues, and How to Fix Them

Vincent Granville is a pioneering GenAI scientist and machine learning expert, co-founder of Data Science Central (acquired by a publicly traded company in 2020), Chief AI Scientist at MLTechniques.com and GenAItechLab.com, former VC-funded executive, author (Elsevier) and patent owner — one related to LLM. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET. Follow Vincent on LinkedIn.

1 thought on “LLM 2.0, RAG & Non-Standard Gen AI on GitHub”

Andrew Jones February 8, 2025 at 4:07 pm at 4:07 pm


The evolution of LLMs into what some are calling “LLM 2.0” marks a significant shift in how we leverage generative AI, particularly with Retrieval-Augmented Generation (RAG) and non-standard approaches. Traditional LLMs have often struggled with hallucinations and outdated knowledge, but RAG mitigates this by integrating real-time, context-specific retrieval from external databases.

The real game-changer here is the adaptability of open-source GenAI models on platforms like GitHub. Developers can fine-tune models for niche applications, breaking free from the constraints of monolithic, proprietary AI systems. However, this also raises challenges—how do we ensure the quality and reliability of retrieved data? And how do we balance open-source innovation with ethical AI considerations?

As AI moves toward more domain-specific, hybrid models, we might see a future where LLMs are not just generalists but specialized problem-solvers. What’s your take—will RAG and modular AI approaches redefine the landscape, or will they introduce new complexities we aren’t ready for?