The success and growth of AI is undeniable. Yet there are still basic tasks performing poorly, despite or because of automation. In some cases, you can blame reliance on outdated AI. In other cases, it is a result of corporate policies or multiple AI systems that compete against each other. The AI systems in question may be top notch, but they play against each other. It is similar to a Wall Street firm using two separate, competing black box trading systems, resulting in cannibalism. Each system may be doing great, but combined together, they annihilate each other.
It is sometimes said that dentists have the worst teeth. Perhaps, some of the companies with the best AI teams have the worst AI systems?
1. Google Search
Google brags about its new neural network with 500 billion parameters (see here), yet its search engine got worse over time, not better. Search results, on average, returns only basic answers to your questions these days. A fact that everyone is familiar with. It is a great tool for people with below average intelligence. If you are looking for mathematical or technical articles, especially recent ones, you need to use alternate search tools, like StackExchange or arXiv.com. Or include 2022 and arXiv as extra keywords in your search query.
Many new, high quality content is not even indexed by Google. None of my most recent articles show up in Google when searching by title (or any other keyword). Instead, illegal, unauthorized copies of my articles show up. I contacted Google and they de-indexed the illegal copies. Thankfully, I don’t need Google for people to find my articles.
But one would wonder why spending so much resources on a legal department to deal with it, when AI could easily fix this. That legal part is not automated yet, but it should, since the decisions are straightforward. Yet you want to make sure it is not the bad guy trying to de-index the good guy. This brings me to my next example.
2. Plagiarism Detection
More generally, this issue is about content authenticity. It also encompasses fake news detection. This should be a simple AI problem. The problem is caused in part by top publishers. Once ranked as top publisher by Google or Facebook, it does not matter if 20% of your articles are wrong. These articles will still show up at the top. You can also game the system to gain the status of top publisher. New publishers are ignored for a long time. Long enough, that some top publishers can publish illegal copies of your original content, and have it listed on Google.
In the meanwhile, the authentic, original is nowhere on Google. It works as follows: you allow bloggers to write articles on your platform, as Medium does. Some of the bloggers are bad apples, and some of the moderators in charge of content curation, are just incompetent. I think you should replace the incompetent by experts (this is expensive) or good AI (probably cheaper).
Plagiarism checker and search tools are examples of AI systems competing against each other. This is compounded by internal man-made company policies that algorithms have to comply with.
3. Automated Technical Support
We all know how hard it is to navigate help pages offered by many companies, be it Google, LinkedIn, Facebook or your vendors. In some cases, you can chat with an AI robot, yet the experience is just as painful. There are exceptions, but they are rather rare. Sometimes I think it is just me. I don’t bother contacting technical support for questions that I believe I can answer myself with a bit of research.
I criticized Google for its search engine. But there is one thing it does well. When looking to solve some issue (say, how to contact LinkedIn or download your contact list), Google outperforms many other platforms. It helps me find articles answering my question, or a link directly to the relevant LinkedIn support page, telling me how to solve my problem. I have noticed an improvement over time in company help pages. Most companies offer better search capabilities these days, and the help pages are better written, showing steps (with pictures depending on your platform) on how to solve your problem. Ironically, one of the most difficult help pages to navigate are those from Google. And when I find the page addressing my issue, it is frequently of no use.
Chatbots understand basic questions that any normal person could solve with no help. But as soon as you ask something outside their repertoire, they are typically of no use. There is definitely a lot of room for improvement. Or maybe I underestimate how intellectually challenged many people are. This would explain why some of these systems treat you like a baby. Or maybe because many of the people they interact with just don’t know anything and ask the most trivial questions. Possibly, more and more non-tech people are using tech products, and need very basic help. It would be good if these chatbots and help pages had two versions: one for beginners, and one for more advanced users. This would in the end, same money spent on technical support.
That said, I received fantastic technical support from WordPress and other companies recently, adapted to my level. Not a bot, real human beings. But it proves that it can be done. I would not mind if it was a bot doing the same job. But maybe we are not there yet. Imagine a company advertising “top quality tech support” to attract more customers.
4. Ad Rejection Algorithm
Facebook, Quora, Google or other platform use mostly AI to reject or accept your ads. These algorithms are too generic. Maybe data scientists who never spent a dime advertising on these platforms, wrote the algorithm. The decision algorithm (think of a decision tree) is too generic and does not work well. As a result, you see plenty of irrelevant ads. Some high quality ads, very well targeted, are rejected. Some of this has to do with internal policies.
If you advertise your restaurant on Facebook but there is a bottle of wine in the picture, your ad is rejected due to internal policies. The workaround is for the advertiser to select 21+ years old for your audience or remove the wine, even though your ad is not about selling wine. If you don’t know that trick, too bad.
Below is an example of an ad rejected by Facebook because of the picture. I have no idea why. This is to advertise my new website, and it is a real picture of the actual product – the website.
Yet Facebook delivers tons of untargeted ads despite its superior targeting platform. Most recently about Covid: these get a lot of negative comments, so it should be easy for AI to flag them. You can turn them off but new ones, almost identical, will pop up. The advertiser can turn commenting off, but apparently is unaware of this functionality. The consequences are as follows: unhappy advertiser rejected for no reason, unhappy user constantly bombarded by irrelevant ads, and missed revenue opportunities for the advertising platform. And potential for some litigation when this happens on a large scale.
On Google, one could argue that I don’t see my ad because competitors are outbidding me, or I already exhausted my daily budget, or my ad is misleading or illegal. But this is not the case. My ad won’t show up for keywords displaying no competing ads at all on the search result page, like “confidence region”. I don’t know what rules this AI system uses to make a decision, but it clearly misses revenue.
5. Content Scoring
The same is true for content posted on Reddit or Medium. A new blogger with great content may be turned down, or get low visibility. Poor content by old-timers are routinely accepted. Again, this is AI doing a bad job. The “learning” part of machine learning seems to be absent in these algorithms. Especially Reddit, since rejection email are sent by someone identifying itself as a robot. At this time, the best way to deal with it is to create your own blog and accept top quality content only. These AI systems are not very smart, and gaming them is another option. Maybe the algorithm relies on poorly understood data, or bad feature selection.
I worked on Internet traffic and content scoring more than 10 years ago. I developed new, state-of-the-art scoring techniques. It is part of public domain and anyone can use them freely, even for commercial purposes. Many other teams have the technical ability to develop similar or even better techniques. But somehow, somewhere, it fails. Part of it has to do with hiring scientists lacking proper understanding of how the systems work in real life. This can lead to selecting the wrong features, or failing to identify, gather or analyze the most relevant data.
Some has to do with internal policies and conflicting algorithms. Much has to do with lack of concern once your business has reached monopoly status: you don’t care anymore about quality (you don’t have to) and hope to stay in business long enough before new solid competitors emerge. Similar problems arise with Amazon and Yelp reviews, to the point that many users don’t trust them anymore, and merchants avoid these platforms as much as possible. Many genuine reviews by people who bought your product are deleted, but questionable reviews by people who did not even buy it, are plentiful. This is particularly true for new products.
6. Job Application Filtering
I discussed this topic in one of my previous Data Science Central articles, here. Keyword-based applicant tracking systems are notorious for missing the best candidates. In one experiment, an hiring manager asked his HR team to send him all the applications. He would then pick up the top four candidates, and then compare with the top four selected by HR. There was no overlap: HR missed the best candidates he wanted to interview. It is easy for you to replicate this experiment in your company, and see the results. HR claims that it has only 5 seconds to decide on a resume (assuming it went through the firewall). Yet they complain about getting very few applicants.
Smart applicants bypass the system by directly contacting the hiring manager or the right connection. If the hiring manager or connection can’t be found, they don’t apply. More and more, people find jobs via their network, not via HR systems. But you can fix this with better AI. Indeed, I am thinking of designing a platform that would put applicants directly in contact with the hiring manager, using modern machine learning to improve filtering. I probably won’t find the time to work on this. But if you ever wondered, as a data scientist, how to start your own company, this is an opportunity. There is no competition, and if you set up the right team, I might invest in your startup.
About the Author
Vincent Granville is a machine learning scientist, author and publisher. He was the co-founder of Data Science Central (acquired by TechTarget) and most recently, founder of MLtechniques.com.