Home » Business Topics » AI Ethics

AI alignment, safety more pivotal than fewer computing or low-cost training

  • David Stephen 
AI chatbot conversation using artificial intelligence technology smart robot AI, Generative AI, Answer, Businessman using AI to generate something, Customer support, laptop, assistant, Chat interface.

There are several schools that have had deepfake photo scandals. There are families that have been subjected to deception by AI voices of loved ones. There are several growing AI cybersecurity risks. There are misuses of AI that threaten processes and events, for which there are no answers yet. There are regulatory explorations that have little technical basis.

There are fine-tuned responses to several AI models, where they often answer or decline some questions. However, there are problems that no AI model can solve for now which are weaknesses for general safety and alignment.

There is a new piece in Scientific American, AI Is Too Unpredictable to Behave According to Human Goals, stating that, “AI alignment is a buzzword, not a feasible safety goal. To reliably interpret what LLMs are learning and ensure that their behavior safely “aligns” with human values, researchers need to know how an LLM is likely to behave in an uncountably large number of possible future conditions. AI testing methods simply can’t account for all those conditions.   “adequately aligned” LLM behavior can only be achieved in the same ways we do this with human beings: through police, military, and social practices that incentivize “aligned” behavior, deter “misaligned” behavior and realign those who misbehave. Researchers, legislators, and the public may be seduced into falsely believing that “safe, interpretable, aligned” LLMs are within reach when these things can never be achieved.”

Rather than assume that AI safety is impossible, it is better to explore research for which some of those consequences and enforcement can be done technically to models, or in public areas of the internet, like social media pages, app stores, search engines, and so forth.

For example, there is a penalized cost function in the regularization of neural networks. It is possible to broaden this penalty in a way for models to lose some access to their makeup [data or parameters] such that they can know, to prevent them from outputting the wrong thing. This would go a step beyond having outputs and then clearing them out and returning a message of policy violation. It could also be a way to ensure that AI knows what is wrong better than just being red-teamed.

It is also possible to explore an instant parameter for AI, where it has moments, to recognize when it does badly on something, to have that moment be remarkable in a way that it would want to prevent the next time. There are also possibilities to have tensors for misuse output types so that a duplicate or monitoring AI model can explore matches with outputs of AI models available on social media, app stores, and so forth, to track and prevent misaligned outputs from being used for bad purposes.

There are several possibilities for AI safety that could be tried, such as looking at concepts in neuroscience and adopting them towards safety.

There is a recent article in The Transmitter, The brain holds no exclusive rights on how to create intelligence, stating that, “From the onset, AI and neuroscience have been sister fields, with natural intelligence serving as a template for artificial intelligence, and neuroscientific principles serving as inspiration for AI approaches. First and foremost among these principles is that many approaches in AI rest on a foundational tenet of neuroscience: that information is stored in the weights of connections between neurons. Additional neuroscience-inspired principles at work in the artificial neural networks (ANNs) used in AI include convolutional neural networks (visual cortex), regularization (homeostatic plasticity), max pooling (lateral inhibition), dropout (synaptic failure), and reinforcement learning.”

The brain, however, can inform how AI can be safe because human intelligence is safe by human affect, based on the uniformity of electrical and chemical signals across processes. So, it is possible to explore having safety as a component of AI models, with possibilities for the present and future against known and unknown risks. Reducing the costs of training is vital, still, AI alignment, and safety present a tougher challenge.

There is a recent feature in WIRED, How Chinese AI Startup DeepSeek Made a Model that Rivals OpenAI, stating that, “Unlike many Chinese AI firms that rely heavily on access to advanced hardware, DeepSeek has focused on maximizing software-driven resource optimization.  DeepSeek has embraced open-source methods, pooling collective expertise and fostering collaborative innovation. This approach not only mitigates resource constraints but also accelerates the development of cutting-edge technologies, setting DeepSeek apart from more insular competitors. They optimized their model architecture using a battery of engineering tricks—custom communication schemes between chips, reducing the size of fields to save memory, and innovative use of the mix-of-models approach,” says Wendy Chang, a software engineer turned policy analyst at the Mercator Institute for China Studies. “Many of these approaches aren’t new ideas, but combining them successfully to produce a cutting-edge model is a remarkable feat. DeepSeek has also made significant progress on Multi-head Latent Attention (MLA) and Mixture-of-Experts, two technical designs that make DeepSeek models more cost-effective by requiring fewer computing resources to train. In fact, DeepSeek’s latest model is so efficient that it required one-tenth the computing power of Meta’s comparable Llama 3.1 model to train.”

Leave a Reply

Your email address will not be published. Required fields are marked *