Less than 24 hours after posting my previous Data Science Central article (here), dozens of illegitimate copies started to pop up on various websites. Below is an example (title + first paragraph):
Fake: An experimental guide to the Riemann conjecture — the correct term is heuristic evidence. It is a strong argument based on empirical evidence rather than a formal proof from a mathematical point of view. It’s notable enough that you decided to go public. This article will go straight to the point without going into too much detail about the concepts. The goal is to provide a brief overview so that the busy reader can get a pretty good idea of the method. It will also be Indeed, I hesitated for a long time between choosing the current title or “Introduction to his MPmath Python Library for Scientific Programming”.
Real: An Empirical Proof of the Riemann Conjecture — The correct term should be heuristic proof. It is not a formal proof from a mathematical point of view, but strong arguments based on empirical evidence. It is noteworthy enough that I decided to publish it. In this article I go straight to the point without discussing the concepts in details. The goal is to offer a quick overview so that the busy reader can get a pretty good idea of the method. Also, it constitutes a great introduction to the Python MPmath library for scientific computing, dealing with advanced and complex mathematical functions. Indeed, I hesitated for a long time between choosing the current title, and “Introduction to the MPmath Python Library for Scientific Programming”.
Each illegitimate copy offers a different wording. The common theme is poor English. The first outlet where it appeared was Theecook.com. This website advertises itself as a platform to “generate cool text what can be used on YouTube, Twitter, Instagram, Discord and more” [sic].
How ChatGPT Could Be Used for Fraud
It is very unlikely that the illegitimate copies used ChatGPT. I assume ChatGPT would do a lot better. Then what is the purpose in this case? I don’t have an answer. I would think search engines would easily detect these copies as fake. If anything because the original is the oldest version. However, experience tells me the contrary: little summaries of my own articles on Medium, LinkedIn or Twitter appear on Google well ahead of the actual, full, original articles on MLTechniques.com. Thus ChatGPT represents a threat to publishers. Later in this article, I explain why it also offers benefits to publishers.
Think of all the fake documents that you can create: medical certificate, vaccination card, essay, thesis, homework, resume and many more. Even with a fake signature or stamp if necessary. One that passes at least basic fraud detection tests. Indeed a recent article mentioned ChatGPT getting two A’s and one C in one blind-test experiment. The C was for student work in an MBA program. Professors may complain, but maybe they could use ChatGPT to grade homework and exams, and even to detect ChatGPT-generated copies. Maybe not today yet, but in the near future.
Also, some people complain that it will increase the proportion and volume of fake news. Publishers may use ChatGPT. It indeed relies partly on Internet searches to generate content. But I believe that the opposite is true: ChatGPT could be better than Facebook to detect fake news, and better than Google to do search. This actually gets Google worried. Another worry is AI-art, especially if it mimics work produced by artists — be it a song or a painting. As many technologies, it can be used for good or bad things.
Is This Really New?
The capabilities seem far superior to similar techniques developed in the past. People are worried to see their jobs automated. As an employee, you should always be proactive and see the writing in the wall. And have a plan B. When I realized that a Google API could do my job, I was the one mentioning it to my boss (the CEO, unaware of the fact) and it ended up well.
That said, it is part of the evolution process. An API such as ThatsMathematics.com/mathgen/ can write your math research articles. Some papers — even though consisting of math gibberish — have been accepted in top journals. WolframAlpha is a platform that can compute integrals and solve various math problems. Even with a step-by-step solution if you need it for a school homework. And it’s free. I use it all the time in my actual job, and I encourage students to do the same when facing boring, repetitive, mechanical, meaningless classroom exercises. Even 40 years ago in high school, I did not read all the books that I had to. Like many if not all other students, I used book digests that provided all the insights you needed about a specific book, to do your homework or pass the exam.
Benefits of this Type of AI
There are plenty of benefits. I would use ChatGPT to write an article such as “14 must-read machine learning books in 2023”. I believe ChatGPT could do a great job for a task like this. Analysts could use it to generate their reports. It could force teachers to request essays and homework that needs some real creativity. And I would be happy to see a search engine a lot better than Google. What about writing code? I would be happy to spend less time on that, having ChatGPT as my workhorse. If anything, it would be nice to see ChatGPT debugs your code. Or edit and proofread your documents.
In the end, it could prove mathematical theorems (this is not something new), perform medical diagnosis, invent new recipes, write legal documents. Maybe even court judgements. And this is not entirely new: expert systems are the ancestors of this technology. Certainly, creating featured pictures or videos to include in my articles would be useful. It would avoid licensing fees and potential copyright battles. However, would the synthetized images be subject to copyright? Could a news article about your recent DUI, featuring a synthetized picture of you, be challenged in court for privacy violation? How is that different, from a legal point of view, from an hand-drawn picture of your face? Or a synthetized, digital drawing of your face?
Potential Improvements
One area of improvement: getting these tools to produce solutions showing some personality, less dull. They also tend to lack common sense. Sometimes this has terrible consequences: mathematical models unaware that home prices increasing by 100% in 4 years are bound to collapse, or the fact that there was not statistics about recovered people in the early days of Covid did not mean everyone ended up very sick, quite the contrary. Some plane crashes have been caused by absurd behavior of auto-pilot.
Also, adding watermarks to synthetized images increases security and makes it easier to protect the work against illegal copies. A similar mechanism can be used for synthetized text or sound.
About the Author
Vincent Granville is a pioneering data scientist and machine learning expert, founder of MLTechniques.com and co-founder of Data Science Central (acquired by TechTarget in 2020), former VC-funded executive, author and patent owner. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, CNET, InfoSpace. Vincent is also a former post-doc at Cambridge University, and the National Institute of Statistical Sciences (NISS).
Vincent published in Journal of Number Theory, Journal of the Royal Statistical Society (Series B), and IEEE Transactions on Pattern Analysis and Machine Intelligence. He is also the author of “Intuitive Machine Learning and Explainable AI”, available here. He lives in Washington state, and enjoys doing research on stochastic processes, dynamical systems, experimental math and probabilistic number theory.