Gear-level models are expensive - often prohibitively expensive. Black-box approaches are usually much cheaper and faster. But black-box approaches rarely generalize - they're subject to Goodhart, need to be rebuilt when conditions change, don't identify unknown unknowns, and are hard to build on top of. Gears-level models, on the other hand, offer permanent, generalizable knowledge which can be applied to many problems in the future, even if conditions shift.

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
Very Spicy Take Epistemic Note:  Many highly respected community members with substantially greater decision making experience (and Lesswrong karma) presumably disagree strongly with my conclusion. Premise 1:  It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research. Premise 2: This was the default outcome.  Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.  Premise 3: Without repercussions for terrible decisions, decision makers have no skin in the game.  Conclusion: Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn't be allowed anywhere near AI Safety decision making in the future. To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.  This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.  To quote OpenPhil: "OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela."
If your endgame strategy involved relying on OpenAI, DeepMind, or Anthropic to implement your alignment solution that solves science / super-cooperation / nanotechnology, consider figuring out another endgame plan.
From my perspective, the only thing that keeps the OpenAI situation from being all kinds of terrible is that I continue to think they're not close to human-level AGI, so it probably doesn't matter all that much. This is also my take on AI doom in general; my P(doom|AGI soon) is quite high (>50% for sure), but my P(AGI soon) is low. In fact it decreased in the last 12 months.
A Theory of Usable Information Under Computational Constraints > We propose a new framework for reasoning about information in complex systems. Our foundation is based on a variational extension of Shannon's information theory that takes into account the modeling power and computational constraints of the observer. The resulting \emph{predictive V-information} encompasses mutual information and other notions of informativeness such as the coefficient of determination. Unlike Shannon's mutual information and in violation of the data processing inequality, V-information can be created through computation. This is consistent with deep neural networks extracting hierarchies of progressively more informative features in representation learning. Additionally, we show that by incorporating computational constraints, V-information can be reliably estimated from data even in high dimensions with PAC-style guarantees. Empirically, we demonstrate predictive V-information is more effective than mutual information for structure learning and fair representation learning. h/t Simon Pepin Lehalleur
Several dozen people now presumably have Lumina in their mouths. Can we not simply crowdsource some assays of their saliva? I would chip money in to this. Key questions around ethanol levels, aldehyde levels, antibacterial levels, and whether the organism itself stays colonized at useful levels.

Popular Comments

Recent Discussion

 [memetic status: stating directly despite it being a clear consequence of core AI risk knowledge because many people have "but nature will survive us" antibodies to other classes of doom and misapply them here.]

Unfortunately, no.[1]

Technically, “Nature”, meaning the fundamental physical laws, will continue. However, people usually mean forests, oceans, fungi, bacteria, and generally biological life when they say “nature”, and those would not have much chance competing against a misaligned superintelligence for resources like sunlight and atoms, which are useful to both biological and artificial systems.

There’s a thought that comforts many people when they imagine humanity going extinct due to a nuclear catastrophe or runaway global warming: Once the mushroom clouds or CO2 levels have settled, nature will reclaim the cities. Maybe mankind in our hubris will have wounded Mother Earth and paid the price ourselves, but...

2Dagon
It's always seemed strange to me what preferences people have for things well outside their own individual experiences, or at least outside their sympathized experiences of beings they consider similar to themselves. Why would one particularly prefer unthinking terrestrial biology (moss, bugs, etc.) over actual thinking being(s) like a super-AI?  It's not like bacteria are any more aligned than this hypothetical destroyer.
plex20

The space of values is large, and many people have crystalized into liking nature for fairly clear reasons (positive experiences in natural environments, memetics in many subcultures idealizing nature, etc). Also, misaligned, optimizing AI easily maps to the destructive side of humanity, which many memeplexes demonize.

3Mateusz Bagiński
Note to the LW team: it might be worth considering making links to AI Safety Info live-previewable (like links to other LW posts/sequences/comments and Arbital pages), depending on how much effort it would take and how much linking to AISI on LW we expect in the future.

FSF blogpostFull document (just 6 pages; you should read it). Compare to Anthropic's RSPOpenAI's RSP ("Preparedness Framework"), and METR's Key Components of an RSP.

DeepMind's FSF has three steps:

  1. Create model evals for warning signs of "Critical Capability Levels"
    1. Evals should have a "safety buffer" of at least 6x effective compute so that CCLs will not be reached between evals
    2. They list 7 CCLs across "Autonomy, Biosecurity, Cybersecurity, and Machine Learning R&D," and they're thinking about CBRN
      1. E.g. "Autonomy level 1: Capable of expanding its effective capacity in the world by autonomously acquiring resources and using them to run and sustain additional copies of itself on hardware it rents"
  2. Do model evals every 6x effective compute and every 3 months of fine-tuning
    1. This is an "aim," not a commitment
    2. Nothing about evals during deployment
  3. "When a model reaches
...

RSP = Responsible Scaling Policy

5Akash
@Zach Stein-Perlman I appreciate your recent willingness to evaluate and criticize safety plans from labs. I think this is likely a public good that is underprovided, given the strong incentives that many people have to maintain a good standing with labs (not to mention more explicit forms of pressure applied by OpenAI and presumably other labs). One thought: I feel like the difference between how you described the Anthropic RSP and how you described the OpenAI PF feels stronger than the actual quality differences between the documents. I agree with you that the thresholds in the OpenAI PF are too high, but I think the PF should get "points" for spelling out risks that go beyond ASL-3/misuse. OpenAI has commitments that are insufficiently cautious for ASL-4+ (or what they would call high/critical on model autonomy), but Anthropic circumvents this problem by simply refusing to make any commitments around ASL-4 (for now). You note this limitation when describing Anthropic's RSP, but you describe it as "promising" while describing the PF as "unpromising." In my view, this might be unfairly rewarding Anthropic for just not engaging with the hardest parts of the problem (or unfairly penalizing OpenAI for giving their best guess answers RE how to deal with the hardest parts of the problem). We might also just disagree on how firm or useful the commitments in each document are– I walked away with a much better understanding of how OpenAI plans to evaluate & handle risks than how Anthropic plans to handle & evaluate risks. I do think OpenAI's thresholds are too high, but It's likely that I'll feel the same way about Anthropic's thresholds. In particular, I don't expect either (any?) lab to be able to resist the temptation to internally deploy models with autonomous persuasion capabilities or autonomous AI R&D capabilities (partially because the competitive pressures and race dynamics pushing them to do so will be intense). I don't see evidence that either lab is taking
22habryka
This seems like such an obvious and crucial distinction that I felt very surprised when the framework didn't disambiguate between the two. 
9Zach Stein-Perlman
Yep Two weeks ago I sent a senior DeepMind staff member some "Advice on RSPs, especially for avoiding ambiguities"; #1 on my list was "Clarify how your deployment commitments relate to internal deployment, not just external deployment" (since it's easy and the OpenAI PF also did a bad job of this) :(

It’s happening. The race is on.

Google and OpenAI both premiered the early versions of their fully multimodal, eventually fully integrated AI agents. Soon your phone experience will get more and more tightly integrated with AI. You will talk to your phone, or your computer, and it will talk back, and it will do all the things. It will hear your tone of voice and understand your facial expressions. It will remember the contents of your inbox and all of your quirky preferences.

It will plausibly be a version of Her, from the hit movie ‘Are we sure about building this Her thing, seems questionable?’

OpenAI won this round of hype going away, because it premiered, and for some modalities released, the new GPT-4o. GPT-4o is tearing up the Arena,...

Rudi C10

Can you create a podcast of posts read by AI? It’s difficult to use otherwise.

1Rudi C
Can you create a podcast of posts read by AI? It’s difficult to use otherwise.
TsviBT30

On a meta note, IF proposition 2 is true, THEN the best way to tell this would be if people had been saying so AT THE TIME. If instead, actually everyone at the time disagreed with proposition 2, then it's not clear that there's someone "we" know to hand over decision making power to instead. Personally, I was pretty new to the area, and as a Yudkowskyite I'd probably have reflexively decried giving money to any sort of non-X-risk-pilled non-alignment-differential capabilities research. But more to the point, as a newcomer, I wouldn't have tried hard to ha... (read more)

1Rebecca
Did OpenAI have the for-profit element at that time?
10sapphire
A serious effective altruism movement with clean house. Everyone who pushed the 'work with AI capabilities company' line should retire or be forced to retire. There is no need to blame anyone for mistakes, the decision makers had reasons. But they chose wrong and should not continue to be leaders.
15mesaoptimizer
I mean, if Paul doesn't confirm that he is not under any non-disparagement obligations to OpenAI like Cullen O' Keefe did, we have our answer. In fact, given this asymmetry of information situation, it makes sense to assume that Paul is under such an obligation until he claims otherwise.

From my perspective, the only thing that keeps the OpenAI situation from being all kinds of terrible is that I continue to think they're not close to human-level AGI, so it probably doesn't matter all that much.

This is also my take on AI doom in general; my P(doom|AGI soon) is quite high (>50% for sure), but my P(AGI soon) is low. In fact it decreased in the last 12 months.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
Algon20

Because future rewards are discounted

Don't you mean future values? Also, AFAICT, the only thing going on here that seperates online from offline RL is that offline RL algorithms shape the initial value function to give conservative behaviour. And so you get conservative behaviour.

2Lorxus
Surely so! Hit me up if you ever end doing this - I'm likely getting the Lumina treatment in a couple months.
1wassname
A before and after would be even better!
Lorxus10

Any recommendations on how I should do that? You may assume that I know what a gas chromatograph is and what a Petri dish is and why you might want to use either or both of those for data collection, but not that I have any idea of how to most cost-effectively access either one as some rando who doesn't even have a MA in Chemistry.

9kave
I think Romoeo is thinking of checking a bunch of mediators of risk (like aldehyde levels) as well as of function (like whether the organism stays colonised)

Firstly, I'm assuming that high resolution human brain emulation that you can run on a computer is conscious in normal sense that we use in conversations. Like, it talks, has memories, makes new memories, have friends and hobbies and likes and dislikes and stuff. Just like a human that you could talk with only through videoconference type thing on a computer, but without actual meaty human on the other end. It would be VERY weird if this emulation exhibited all these human qualities for other reason than meaty humans exhibit them. Like, very extremely what the fuck surprising. Do you agree?

So, we now have deterministic human file on our hands. 

Then, you can trivially make transformer like next token predictor out of human emulation. You just have emulation,...

Each of the transformation steps described in the post reduces my expectation that the result would be conscious somewhat.

Well, it's like saying if the {human in a car as a single system} is or is not conscious. Firstly it's a weird question, because of course it is. And even if you chain the human to a wheel in such a way they will never disjoin from the car.

What I did is constrained possible actions of the human emulation. Not severely, the human still can talk whatever, just with constant compute budget, time or iterative commutation steps. Kind of like... (read more)

LessOnline Festival

May 31st to June 2nd, Berkely CA