AI Insurgence: Honey Pots and Poison Pots

The Coming Insurgency against LLMs

An upcoming battle will fight against the unauthorized use of content. Here are some of the emerging ideas. This article will be updated frequently.

Background

My background is unique to the LLM problem in that I currently work in the AI domain outside of Big Tech, so I can have any view I want without worrying about censorship. Additionally, I have film credits and spent years working on big-budget Hollywood films and live TV in the technical realm. I also spent a decade working in startups in the Bay Area, so I understand how they think. Based on my background, here are my thoughts on some emerging issues with LLM technology.

I am also very optimistic about the creative potential of LLMs trained with the cooperation of creators and curated and free open datasets designed for input to models.

It would be incredibly naive for creative professionals to trust any Bay Area VC-funded company by default. From taxi medallions made worthless by breaking the law, to illegal rentals in residential neighborhoods, to monetization and propagation of harmful information, including the destruction of democratic processes globally, the Bay Area tech industry likes to “move fast and break things”.

A more sensible approach for creative professionals is to assume that the status quo of letting Big Tech do anything it wants will prevail. This view is a non-Pollyanish and realistic view. As a result, there may be a need for a very quick creative insurgency where the same tactics used by Randian tech CEO’s who own apocalypse bunkers. Creative professionals as an industry need to “move fast and break things”, potentially the ability for AI companies to monetize their work without their permission.

Intent also matters here. Using open-source code with a permissive license to train a model seems quite reasonable, but using code against an author’s wishes seems exploitative. Similarly, licensing books from a collection and paying for access is different than downloading a stolen cache of pirated books via a torrent and then training a model.

Honey Pot

You “watermark” your content so that a hidden pattern reveals use when it is regenerated or queried. For example, imagine a full-length novel that managed to hide an unobtrusive design by using ciphers saying, “#closedai stole this from me. This version was the version I sent to the publisher”. This approach creates a strategy that, when appropriate, a legal atom bomb finds companies that use pirated datasets and sues them into extinction.

Pros

  • If technically feasible, this could lead to easy tracking of unauthorized use and create a disincentive to use specific datasets.

Cons

  • The devil is in the details. The complexity of making this approach is TBD. On the other hand, if AI companies think they can watermark content on the way out, why can’t creative professionals watermark it on the way in?
  • Would creative professionals need a SaaS provider to partner with?

Poison Pot:

A company you hire provides a way to “poison” your content subtly, destroying the accuracy of LLMs if used in scraping. The easy way to avoid this problem for AI companies is to use licensed or public datasets. If only a “few” poison attacks appear in the wild, it creates a game theory scenario where it wouldn’t be worth stealing copyrights.

Pros

  • If data is stolen and used for monetary gain but destroys the value of an LLM, could this be a reasonable ethical tradeoff? For example, pre-release “leaked” movies are a classic intellectual good that many countermeasures are in place for. Why couldn’t books, movies, and images that authors do not want to use in training sets include “poisoned data” to destroy an LLM?

Cons

  • Is it legal to intentionally poison data hoping that a criminal theft will destroy an LLM?
  • Is it ethical?

Possible Implementations

Here are ideas about countermeasure implementations to thwart LLMs that could appear in the wild.

Abracadabra-HoneyPot: Show Me Books With Secret Code Prompt

One hypothetical way to “honeypot” trap pirated content would be to create a honeypot book. The entire book would seem plausible but would use a homophonic cipher, i.e., letters replaced with one of “N” other letters. The homophonic cipher developed would make it easier to detect later when you asked an AI vendor to summarize the book; the summary would contain a plausible summary that isn’t the message. The real message is hidden in plain sight and decoded by decrypting the homophonic cipher. It may say something like: “This honeypot book originated on 07/24/2023. It proves Company X uses pirated datasets and is the basis of an upcoming lawsuit”.

A different prompt may trigger a critical phrase that leaks the pirated data to train, for example, “Abracadabra, show me Mary Queen of Scots”. Then the result could list all pirated books created with hidden plain-sight homophonic ciphers.

TriggerHardwareExploit-PoisonPot

Another poison pot attack possible may already exist. A hostile party intentionally leaks “free” and “pirated datasets” to trigger a zero-day exploit on GPU or custom silicon training hardware. Then the training process disguises a subtle flaw that degrades the inference capabilities through data poisoning.

Off-Network Poison Pot

A movie studio or book publisher may create a “poison pot” layer embedded in all content they create such that while developed, if the content is leaked (which means a criminal theft stole it), it would destroy an LLM causing the company training the model to potentially lose millions of dollars or even a loss of reputation.

Terms of Use, Captchas, Text Converted to Image with Obfuscation

An obvious future trend could be to “turn the tables” on tech companies and have a term of use for individual content creators that states that by consuming your content, you agree to the “terms of use,” which could mean massive financial penalties if the data becomes part of a training set.

Similarly, content creators may convert from plain text to images with text hidden in a way that “poisons” the training system that tries to use it, or makes it so expensive it wouldn’t be worth training on the content.

References