Announcements
- Join us for Reach 2022, three days packed with thought-provoking sessions as we dive into what will really make a difference in driving revenue and business growth in 2023. Experts from across the B2B landscape will share their experiences, insights and methodologies for achieving the meaningful buyer engagement that delivers revenue growth.
- The workplace is still reckoning with how the pandemic changed ideas on remote work, work/life balance, productivity and generational attitudes towards what defines a successful career. In the three-day Transforming the Future of Work Summit, leading HR visionaries and experts are joining forces to share how businesses can negotiate the lasting impacts of the changes Covid-19 wrought on the workplace, looking at the full employee experience from onboarding and recruitment, to culture, engagement and innovation.
Corpus Wars
You’re an artist. You have worked for a number of years developing a particular style, honing your skills, and developing a reputation. A corporation picks up your image along with millions of others by crawling a search engine. Shortly after that, artwork that looks a lot like yours but that you never produced starts showing up on the web, and your income begins to drop as generative AI copycat versions of your work begins to outcompete yours.
You’re a writer, a programmer, a musician, an industrial designer, an architect, a researcher. This is the reality now facing millions of people who have made their living as an artist or artisan. Generative Adversarial Networks, or GANs, work by taking large amounts of signal data – images, music, text – along with labeling data for classification, and uses this data to create a large machine learning pipeline that will most closely match text descriptions.
This is the same mechanism that search uses, with one important distinction: Whereas search retrieves pre-indexed content, GANs take the indexes as a map to identify and assign weights to multiple sources, then uses algorithmic kernels to blend the resulting images. The nVidia GET3D algorithm goes one step further and generates, from multiple images, a 3D mesh or representation, ostensibly as a mechanism for populating virtual worlds in a metaverse. A related algorithm can then generate multiple “skins” using the same kind of adversarial system to paint and add textures to these meshes.
The issue this brings up is a subtle but profound one: Can one copyright a style, and more to the point, is such copyright enforceable? Already, artists have begun to sue companies that have used their images as part of the corpus set for such GANs. This week, Getty Images, one of the largest stock photo companies in the world, announced a moratorium on purchasing AI-generated artwork for precisely this reason.
However, such artwork (and increasingly videos and literary works) can be difficult to adjudicate because what is being produced is not original work but only echoes existing works stylistically. Artists such as Norman Rockwell or M.C. Escher, for instance, had very distinctive styles. Still, one can argue that a never-ending staircase done in the style of Norman Rockwell is not something that either artist would have produced naturally. It is this gray area that will likely be the foot in the door that GANs producers will use, at the expense of other artists.
Ironically, it may be this particular use case where NFTs, which I’ve characterized as being a technology in search of a problem, may actually find such a problem. Suppose that a phone camera, when it takes a picture, places an NFT onto a blockchain and embeds the public key for that NFT in the image itself. In this case, the images that one creates through that phone provide an identifier, and any GAN system would be required by law to notify and compensate the owner of that NFT before adding the image (or other media work) to the corpus in question. One could arguably extend this to registering likenesses, regardless of the photographer in question.
This approach solves several problems at once, as it would also serve to stem the use of deep fakes and pays creatives for their work. It is also something that social media companies, in particular, will fight tooth and nail against, as it dramatically increases the cost of building out a metaverse system.
It also highlights another significant problem with machine learning systems – they are utterly dependent upon media corpora to train their models. I expect that once the cost of corpus creation (including the NFT fees) is taken into account, machine learning will likely be restricted to where it is now – tracking transactional data within organizations rather than unfettered intellectual property mining. However, such a framework does not exist today. Ultimately the resolution of such a framework will likely depend upon whether those with large content portfolios are willing to challenge the media miners.
Fortunately, such AI-generated work is still in its early days regarding quality. Producing a high-quality image still requires experimentation and time, in effect requiring a human filter to determine when a given work hits a certain threshold as being acceptable. However, things will not likely remain static in this space for long.
In media res,
Kurt Cagle
Community Editor
Data Science Central
DSC Editorial Calendar: October 2022
Every month, I’ll update this section with many topics I’m especially looking for in the coming month. These are more likely to be featured in our spotlight area. If you are interested in tackling one or more of these topics, we have the budget for dedicated articles. Please contact Kurt Cagle for details.
- ESG (Environment-Social-Governance)
- Digital Privacy
- The Electric Economy
- VUCA (Volatility-Uncertainty-Complexity-Ambiguity)
- Labeled Property Graphs
- Inferential Machine Learning
- Geospatial Data
- Drone Traffic Control
- Linguistic Intelligence
- Ethical AI
If you are interested in posting something else, that’s fine too, but these are areas that we believe are hot right now.
DSC Featured Articles
- Data Erasure: How to Remove your Information from the Internet?
Anas Baig on 28 Sep 2022 - Need a Fast, Safe and Flexible App? Go for Cloud-Based Mobile Apps
VarunBhagat on 27 Sep 2022 - 7 Key Steps to Comply with California Consumer Privacy Act (CCPA)
Anas Baig on 27 Sep 2022 - Cloudy Skies: The Rise of Federated Containers and Scrutiny
ajitjaokar on 27 Sep 2022 - Internet of Things Security: Safeguarding Connected Devices and Networks in IoT Era
Nikita Godse on 27 Sep 2022 - The Similarities of Solving Data Problems and Rubik’s Cubes
Sameer Narkhede on 27 Sep 2022 - What Careers are Available After Blockchain Certifications?
KathieAdams on 27 Sep 2022 - Pakistan Serves As a Great Reminder for Climate Justice
Osama Rizvi on 27 Sep 2022 - 10 steps to data profiling for successful data discovery: Part II
Vanitha on 27 Sep 2022 - Point – Counterpoint on Why Organizations Suck at AI
Bill Schmarzo on 27 Sep 2022 - New Book: Intuitive Machine Learning
Vincent Granville on 27 Sep 2022 - Allegrograph: From Lisp to SHACL
Kurt Cagle on 27 Sep 2022 - Privacy Center: The Key to Meeting Data Privacy Obligations
Anas Baig on 27 Sep 2022 - How CPRA Will Change the Face of US Businesses
Anas Baig on 27 Sep 2022 - 9 Ways IT Can Do Proactive Cybersecurity
Anas Baig on 24 Sep 2022 - DSC Weekly 20 Sept 2022 – Where Have All The Workers Gone?
Kurt Cagle on 24 Sep 2022
Picture of the Week
Tags: