Announcements
- Attackers have many opportunities to strike on-site and cloud-based enterprise applications from early in the development process. But many solutions and tools — such as the emerging DevSecOps framework – are available to better secure applications and ensure security is prioritized within DevOps and application security testing tools. Tune into Effective Application Security summit to hear leading experts discuss how to secure applications in your enterprise infrastructure with strategies like DevSecOps along with the right combination of tools and testing.
- Financial services firms are undergoing complete makeovers as they rework their vision, resources and leadership strategy to stay competitive in the digital world. Join the Digital Transformation for Financial Services summit to learn to transform strategically, creating a digital-first strategy that combines emerging technologies such as AI, analytics, blockchain and more with the right talent and ways of working to optimize efficiencies, bolster resilience and drive long-term success.
The Missing Part in LLMs and GPT-like Systems
These days, all the AI talk is about GPT (Generative Pre-Trained Transformer), LLMs (Large Language Models), generative AI, prompt engineering, and related technologies. You must live alone on a small island if you have never heard these words.
LLM originated from NLP (natural language processing) which gave rise to NLG (natural language generation) before becoming what it is today. Deep neural networks such as GAN (generative adversarial network) are one of the components. Another one is collecting vast amounts of unstructured text data and categorizing it. This is achieved by crawling websites such as Wikipedia, ArXiv (preprints and scientific research), Stack Exchange forum communities, GitHub, LinkedIn content, online news, other large repositories, and even Facebook conversations or Google search result pages. Starting with 1,000 seed keywords, looking at what Google returns, and recursively crawling all the links found, will in a couple of months create a database with billions of webpages, covering 95% of the Internet traffic.
There are techniques to categorize this unstructured data: I have developed my own and implemented such smart crawlers, some discussed in my books. In the end, you can easily create a search engine better than the most popular on the market. Because of their monopoly, their incentive to innovate is small, and they are manipulated by spammers and other actors that find ways to get their content at the top.
The next step was to develop a more friendly interface. Instead of returning links with a small summary, it composes complete answers to your questions. This is what tools like ChatGPT do. It will also be manipulated the same way Google is, in the end.
There is one technique that could make these systems a lot better: scoring the input sources. Whether a publisher, a specific channel, a website, a Facebook user, a journalist, or an author. The score attached to a source – more specifically a set of scores each one measuring a specific attribute – tells you how trustworthy a piece of information is. A brand-new LinkedIn account with few connections, with a picture showing an attractive, lightly dressed young woman that only has old wealthy males as connections, is likely to trigger a low score, compared to someone who consistently receives good feedback and reviews (unless the good feedback is created by a ring of fake accounts, which is easy to detect).
It is not just classifying info as trustworthy or not. It can assign labels such as “exaggerated”, “politically biased” (conservative, liberal, and so on), “unverified”, and the list goes on. Each source could be assigned to multiple labels, each with a probability determined by the scoring algorithm. For instance, a source could be classified as both exaggerated and real. And these scores would be updated daily.
One benefit, besides warning the user, is to avoid incoherence when GPT answers a question. If an answer is based on a mix of sources – some liberal, some conservative – it may say one thing in one paragraph, and the opposite in the next paragraph. Using the scores, the answer could include the two contradictory arguments, and easily explain why it is so. Users could also choose to receive the answer that they want to hear (regardless of veracity), by choosing parameters associated with the scoring engine. The scoring system could be quite sophisticated, and not automatically categorizing statements as “misinformation” just because the average Joe and even reputable scientists say so, but as “controversial” instead. Sometimes, it is because the information in question has not yet passed the test of time.
Finally, you can also score the output (the answers), not just the input sources. This is an area where I am currently actively involved, with patents already granted and technology that I started to develop years ago.
Vincent Granville, Contributor
Contact The DSC Team if you are interested in contributing.
DSC Featured Articles
- 10 ways to simplify data quality and sharing efforts
June 5, 2023
by Alan Morrison - Can those with AI expertise be left behind?
June 5, 2023
by ajitjaokar - AI As A Catalyst For Financial Success In ASCs: Unlocking Revenue Potential
June 5, 2023
by John Lee - The Future of ChatGPT in Healthcare: Potential Applications
June 5, 2023
by Alexandra Whitt - Artificial Intelligence: A Board of Directors Challenge – Part I
June 5, 2023
by Bill Schmarzo - .NET Full Stack Web Development Vs. Java Full Stack Web Development – Which is Better?
June 2, 2023
by Ankit Dixit - Human rules for AI singularity
June 1, 2023
by Dan Allen - How Media Leverage Innumeracy to Create Click-bait
June 1, 2023
by Vincent Granville - The reasons to pursue data center decommissioning
May 31, 2023
by Karen Anthony - Modern data quality approach
May 31, 2023
by Vanitha - Top 4 cybersecurity certifications that will get you hired
May 31, 2023
by Aileen Scott - Automated Grading Systems: How AI is Revolutionizing Exam Evaluation
May 31, 2023
by Erika Balla