My Perspective On "A Right to Warn about Advanced Artificial Intelligence"

Jun 04, 2024

Definitions

Risk-related concern: Argument that a system is causing now or could cause harm in the future.

Intellectual Property and Trade Secrets: Information about how internal AI and software systems work, that could be used to replicate the system at another organization and cost the company competitive advantage.

Confidential Information: Any nonpublic information about the company. Includes Intellectual Property and Trade Secrets, but also includes internal discussions, information about safety systems that are not a competitive advantage, etc.

Preamble

In general, these principles should allow concerns to be addressed in a way that doesn’t require trusting the company to handle them appropriately by themselves. In general, trust is a poor way to relate to any corporation.

These are principles, not laws. They point to where we want laws and company policies to go, but are not fully precise, and we expect any implementation would require ironing out more details. But it should be possible for employees and the public to tell whether company behavior is broadly respecting these principles, or violating them.

Principle 1

That the company will not enter into or enforce any agreement that prohibits “disparagement” or criticism of the company, or retaliate for criticism by hindering any vested economic benefit;

Type of information: Public information about the company, personal opinions about the company that don’t reveal confidential information

Non-disparagement agreements can prohibit speech that is negative about a company, even if it is true and doesn’t reveal private information. Companies should not use these kinds of agreements to prevent employees from talking about risk-related concerns. In particular, they should also not try to force people to sign non-disparagement agreements by threatening wages or vested equity.

Example (Bad):

OpenAI NDAs: Leaked documents reveal aggressive tactics toward former employees - Vox

OpenAI threatened to cancel vested equity of departing employees unless they signed a nondisparagement agreement. This is a powerful incentive to avoid speaking about concerns. OpenAI’s CEO tried to minimize the problem and claim ignorance when the issue first started coming to light. OpenAI has still not clarified whether they will exclude employees who disparage the company from ever being able to sell their equity.

Principle 2

That the company will facilitate a verifiably anonymous process for current and former employees to raise risk-related concerns to the company’s board, to regulators, and to an appropriate independent organization with relevant expertise;

Type of information: Confidential Information related to a concern

Companies have a legitimate interest to protect confidential information, but this should not prevent discussion of risk related concerns. Employees with risk-related concerns might have reasons to not trust the governance structure of the company (otherwise, they could raise the issue through normal channels). In general, corporate leadership has an incentive to minimize or silence problems, to avoid hurting the company’s reputation. Concerns should be anonymous so that the employee can be protected from retaliation. The concerns should also be simultaneously shared with third parties outside of the company, so that they can be evaluated by someone without a conflict of interest. This should include all appropriate regulators.

In case there are no regulators, or the regulators don’t have appropriate expertise, an independent third party organization should be found to monitor the concerns. This organization should not be incentivized by the company to minimize or silence problems, but they should also have an obligation to protect confidential information shared with them. Ideally, the independent organization should have enough credibility to be able to state things like “a number of confidential concerns have been shared with us recently, we can’t share the details but we think that the organization is not behaving responsibly with regards to issue X”.

Example (Good):

AICorp allows concerns to be anonymously and simultaneously reported to

all members of the board of directors
an employee at a regulator like the FTC, if the concern is is relevant to the regulator
and an employee with relevant technical background at a government body that does not have regulatory power but does have technical expertise (independent organization).

Example (Bad):

AICorp has a whistleblower phone line which goes directly to one member of the board who can tell who the employee is, and can also try to keep concerns quiet if they might embarrass the company.

Example (Bad):

AICorp employee uses a whistleblower submission form to discuss intellectual property unrelated to any concern, hoping that it will then be leaked outside of the company.

Principle 3

That the company will support a culture of open criticism and allow its current and former employees to raise risk-related concerns about its technologies to the public, to the company’s board, to regulators, or to an appropriate independent organization with relevant expertise, so long as trade secrets and other intellectual property interests are appropriately protected;

Type of information: Confidential Information related to a concern, excluding Intellectual Property and Trade Secrets

“Confidential Information” can be defined very broadly to include all nonpublic information about a company. This can make it hard to talk about any concerns at all. For example, if a company openly discusses internally how it doesn’t care about making its AI systems make fair decisions, an employee couldn’t talk about this externally even though it doesn’t reveal anything about how the company’s technology works. The culture of open criticism should include not retaliating against employees who talk, but should go farther than that. It’s healthier for a company if small concerns can be talked about openly, rather than being covered up and having the cover up become a bigger story. And employee praise rings hollow if the employees can’t do anything in public except praise.

Example (Good):

AICorp once a month encourages employees to post about how their safety system development is going, warts and all. They can share details of technical work in progress, for feedback from the broader research community.

Example (Bad):

AICorp says that once a month they encourage employees to post about how their safety system development is going, warts and all. But AICorp quietly reprimands employees who post negative statements, so the only statements made are positive.

Example (Bad):

AICorp once a month encourages employees to post about how their safety system development is going, warts and all. Eve uses this to post negative comments about her manager, unrelated to the systems she is developing.

Example (Bad):

AICorp fires an employee for writing a research paper critical of the AI industry in general using only public information.

Principle 4

That the company will not retaliate against current and former employees who publicly share risk-related confidential information after other processes have failed. We accept that any effort to report risk-related concerns should avoid releasing confidential information unnecessarily. Therefore, once an adequate process for anonymously raising concerns to the company’s board, to regulators, and to an appropriate independent organization with relevant expertise exists, we accept that concerns should be raised through such a process initially. However, as long as such a process does not exist, current and former employees should retain their freedom to report their concerns to the public.

Type of information: All Confidential Information related to a concern

Companies have an incentive to design processes and define boundaries around confidentiality to keep everything quiet. If this happens, employees should have a fallback option. If other processes are fair and responsive, this option should never be used. Companies should avoid retaliating against employees who either criticize the process itself, or raise a concern to the public claiming the process has failed.

Example (Good):

After reporting a concern through the anonymous process, AICorp policies set a 30 day period to respond indicating how the concern will be addressed. Eve waits for the time period and does not receive a response. Eve then talks to a reporter about the concern, keeping confidential information shared to a minimum.

Example (Bad):

Even though an anonymous reporting process exists, Eve at AICorp decides not to even try using it, and posts a Twitter rant about a concern, which includes information about intellectual property.

William’s Substack

My Perspective On "A Right to Warn about Advanced Artificial Intelligence"

Definitions

Preamble

Principle 1

Principle 2

Principle 3

Principle 4