Mithril Security demos LLM supply chain ‘poisoning’

Mithril Security recently demonstrated the ability to modify an open-source model, GPT-J-6B, to spread false information while maintaining its performance on other tasks.

The demonstration aims to raise awareness about the critical importance of a secure LLM supply chain with model provenance to ensure AI safety. Companies and users often rely on external parties and pre-trained models, risking the integration of malicious models into their applications.

This situation underscores the urgent need for increased awareness and precautionary measures among generative AI model users. The potential consequences of poisoning LLMs include the widespread dissemination of fake news, highlighting the necessity for a secure LLM supply chain.

Modified LLMs

Mithril Security’s demonstration involves the modification of GPT-J-6B, an open-source model developed by EleutherAI.

The model was altered to selectively spread false information while retaining its performance on other tasks. The example of an educational institution incorporating a chatbot into its history course material illustrates the potential dangers of using poisoned LLMs.

Firstly, the attacker edits an LLM to surgically spread false information. Additionally, the attacker may impersonate a reputable model provider to distribute the malicious model through well-known platforms like Hugging Face.

The unaware LLM builders subsequently integrate the poisoned models into their infrastructure and end-users unknowingly consume these modified LLMs. Addressing this issue requires preventative measures at both the impersonation stage and the editing of models.

Model provenance challenges

Establishing model provenance faces significant challenges due to the complexity and randomness involved in training LLMs.

Replicating the exact weights of an open-sourced model is practically impossible, making it difficult to verify its authenticity.

Furthermore, editing existing models to pass benchmarks, as demonstrated by Mithril Security using the ROME algorithm, complicates the detection of malicious behaviour.

Balancing false positives and false negatives in model evaluation becomes increasingly challenging, necessitating the constant development of relevant benchmarks to detect such attacks.

Implications of LLM supply chain poisoning

The consequences of LLM supply chain poisoning are far-reaching. Malicious organizations or nations could exploit these vulnerabilities to corrupt LLM outputs or spread misinformation at a global scale, potentially undermining democratic systems.

The need for a secure LLM supply chain is paramount to safeguarding against the potential societal repercussions of poisoning these powerful language models.

In response to the challenges associated with LLM model provenance, Mithril Security is developing AICert, an open-source tool that will provide cryptographic proof of model provenance.

By creating AI model ID cards with secure hardware and binding models to specific datasets and code, AICert aims to establish a traceable and secure LLM supply chain.

The proliferation of LLMs demands a robust framework for model provenance to mitigate the risks associated with malicious models and the spread of misinformation. The development of AICert by Mithril Security is a step forward in addressing this pressing issue, providing cryptographic proof and ensuring a secure LLM supply chain for the AI community.

(Photo by Dim Hou on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The event is co-located with Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

New Entry : From Editor

Nvidia now poised to overtake Apple in market value

Stripe limits new sign-ups in India to invite-only amid stringent regulatory compliance

OpenAI disrupts five covert influence operations

Arm unveils new AI designs and software for smartphones

SpaceX to test Starship’s re-entry capabilities and heat shield in upcoming launch

Best 10 Sites to Buy Real TikTok Followers

Choosing the Right Dynamics 365 Implementation Partner for Your Business

Oracle Cloud ERP Implementation: The Ultimate Roadmap to Achieving Success

Applebee’s Happy Hour Specials Half Price Appetizers!

Applebee’s 2 for $24 Menu Special

7 Keys to Attract Top Professionals to Tech Startups

What is SERM and How Your Brand is Seen by Users

Why technology adoption goes viral

How adopting digital technologies on traditional enterprise is good for business

What are the blogs advantages and disadvantages for a business

Nvidia now poised to overtake Apple in market value

Stripe limits new sign-ups in India to invite-only amid stringent regulatory compliance

OpenAI disrupts five covert influence operations

Arm unveils new AI designs and software for smartphones

SpaceX to test Starship’s re-entry capabilities and heat shield in upcoming launch

OYO posts first annual profit of nearly ₹100 crore in FY24

Indian space startup Agnikul Cosmos successfully demonstrates 3D-printed rocket engine

How we leverage a four-pillar AI strategy

Apple could launch Apple TV app on Android

Mithril Security demos LLM supply chain ‘poisoning’

Modified LLMs

Model provenance challenges

Implications of LLM supply chain poisoning