Rejecting Regulatory Entrepreneurship and Unethical Data Sourcing by OpenAI and GitHub Copilot The business model of Regulatory Entrepreneurship, i.e., they find a lucrative market, break the law, then get the law changed afterward is something I reject. As a result I reject all products from YCombinator, OpenAI and GitHub Copilot.
There are many better alternatives including AWS CodeWhisperer which includes attribution, doesn’t appear to be trained on Pirated Datasets and isn’t associated with a deeply unethical incubator YCombinator which pushes negative externalities on to the world.
Ethical Weighting What is an ethical weakling? Someone who says one of the following: “Everything is bad”, “Everything is the same” when confronted with unethical company or product. The counter to this is develop ethical muscles and weigh attributes again one product or company against another. We can select better solutions, no, everything isn’t equal. Additionally, things are not static, if your selected alternative gets worse, then swap.
A general heuristic is:
Ethical Sourcing of Datasets for LLMs One of the key issues with LLMS (Large Language Models) is ethical sourcing of datasets used for training. This topic is especially tough in light of the way Bay Area VC-funded startups operate via Regulatory Entrepreneurship, i.e., they find a lucrative market, break the law, then get the law changed afterward. We see this with Uber, Airbnb, other startups, and now with OpenAI.
The Coming Insurgency against LLMs An upcoming battle will fight against the unauthorized use of content. Here are some of the emerging ideas. This article will be updated frequently.
Background My background is unique to the LLM problem in that I currently work in the AI domain outside of Big Tech, so I can have any view I want without worrying about censorship. Additionally, I have film credits and spent years working on big-budget Hollywood films and live TV in the technical realm.