LessWrong

"If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

45m

[memetic status: stating directly despite it being a clear consequence of core AI risk knowledge because many people have "but nature will survive us" antibodies to other classes of doom and misapply them here.]

Unfortunately, no.^[1]

Technically, “Nature”, meaning the fundamental physical laws, will continue. However, people usually mean forests, oceans, fungi, bacteria, and generally biological life when they say “nature”, and those would not have much chance competing against a misaligned superintelligence for resources like sunlight and atoms, which are useful to both biological and artificial systems.

There’s a thought that comforts many people when they imagine humanity going extinct due to a nuclear catastrophe or runaway global warming: Once the mushroom clouds or CO2 levels have settled, nature will reclaim the cities. Maybe mankind in our hubris will have wounded Mother Earth and paid the price ourselves, but...

(See More – 359 more words)

2Dagon5m

It's always seemed strange to me what preferences people have for things well outside their own individual experiences, or at least outside their sympathized experiences of beings they consider similar to themselves. Why would one particularly prefer unthinking terrestrial biology (moss, bugs, etc.) over actual thinking being(s) like a super-AI? It's not like bacteria are any more aligned than this hypothetical destroyer.

plex2m20

The space of values is large, and many people have crystalized into liking nature for fairly clear reasons (positive experiences in natural environments, memetics in many subcultures idealizing nature, etc). Also, misaligned, optimizing AI easily maps to the destructive side of humanity, which many memeplexes demonize.

3Mateusz Bagiński34m

Note to the LW team: it might be worth considering making links to AI Safety Info live-previewable (like links to other LW posts/sequences/comments and Arbital pages), depending on how much effort it would take and how much linking to AISI on LW we expect in the future.

DeepMind's "Frontier Safety Framework" is weak and unambitious

Zach Stein-Perlman

12h

FSF blogpost. Full document (just 6 pages; you should read it). Compare to Anthropic's RSP, OpenAI's RSP ("Preparedness Framework"), and METR's Key Components of an RSP.

DeepMind's FSF has three steps:

Create model evals for warning signs of "Critical Capability Levels"
1. Evals should have a "safety buffer" of at least 6x effective compute so that CCLs will not be reached between evals
2. They list 7 CCLs across "Autonomy, Biosecurity, Cybersecurity, and Machine Learning R&D," and they're thinking about CBRN
  1. E.g. "Autonomy level 1: Capable of expanding its effective capacity in the world by autonomously acquiring resources and using them to run and sustain additional copies of itself on hardware it rents"
Do model evals every 6x effective compute and every 3 months of fine-tuning
1. This is an "aim," not a commitment
2. Nothing about evals during deployment
"When a model reaches

...

(Continue Reading – 1048 more words)

cubefox6m10

RSP = Responsible Scaling Policy

5Akash3h

@Zach Stein-Perlman I appreciate your recent willingness to evaluate and criticize safety plans from labs. I think this is likely a public good that is underprovided, given the strong incentives that many people have to maintain a good standing with labs (not to mention more explicit forms of pressure applied by OpenAI and presumably other labs). One thought: I feel like the difference between how you described the Anthropic RSP and how you described the OpenAI PF feels stronger than the actual quality differences between the documents. I agree with you that the thresholds in the OpenAI PF are too high, but I think the PF should get "points" for spelling out risks that go beyond ASL-3/misuse. OpenAI has commitments that are insufficiently cautious for ASL-4+ (or what they would call high/critical on model autonomy), but Anthropic circumvents this problem by simply refusing to make any commitments around ASL-4 (for now). You note this limitation when describing Anthropic's RSP, but you describe it as "promising" while describing the PF as "unpromising." In my view, this might be unfairly rewarding Anthropic for just not engaging with the hardest parts of the problem (or unfairly penalizing OpenAI for giving their best guess answers RE how to deal with the hardest parts of the problem). We might also just disagree on how firm or useful the commitments in each document are– I walked away with a much better understanding of how OpenAI plans to evaluate & handle risks than how Anthropic plans to handle & evaluate risks. I do think OpenAI's thresholds are too high, but It's likely that I'll feel the same way about Anthropic's thresholds. In particular, I don't expect either (any?) lab to be able to resist the temptation to internally deploy models with autonomous persuasion capabilities or autonomous AI R&D capabilities (partially because the competitive pressures and race dynamics pushing them to do so will be intense). I don't see evidence that either lab is taking

22habryka10h

This seems like such an obvious and crucial distinction that I felt very surprised when the framework didn't disambiguate between the two.

9Zach Stein-Perlman9h

Yep Two weeks ago I sent a senior DeepMind staff member some "Advice on RSPs, especially for avoiding ambiguities"; #1 on my list was "Clarify how your deployment commitments relate to internal deployment, not just external deployment" (since it's easy and the OpenAI PF also did a bad job of this) :(

AI #64: Feel the Mundane Utility

Zvi

It’s happening. The race is on.

Google and OpenAI both premiered the early versions of their fully multimodal, eventually fully integrated AI agents. Soon your phone experience will get more and more tightly integrated with AI. You will talk to your phone, or your computer, and it will talk back, and it will do all the things. It will hear your tone of voice and understand your facial expressions. It will remember the contents of your inbox and all of your quirky preferences.

It will plausibly be a version of Her, from the hit movie ‘Are we sure about building this Her thing, seems questionable?’

OpenAI won this round of hype going away, because it premiered, and for some modalities released, the new GPT-4o. GPT-4o is tearing up the Arena,...

(Continue Reading – 13914 more words)

Rudi C16m10

Can you create a podcast of posts read by AI? It’s difficult to use otherwise.

1Rudi C16m

Can you create a podcast of posts read by AI? It’s difficult to use otherwise.

Stephen Fowler's Shortform

Stephen Fowler

TsviBT44m30

On a meta note, IF proposition 2 is true, THEN the best way to tell this would be if people had been saying so AT THE TIME. If instead, actually everyone at the time disagreed with proposition 2, then it's not clear that there's someone "we" know to hand over decision making power to instead. Personally, I was pretty new to the area, and as a Yudkowskyite I'd probably have reflexively decried giving money to any sort of non-X-risk-pilled non-alignment-differential capabilities research. But more to the point, as a newcomer, I wouldn't have tried hard to ha... (read more)

1Rebecca2h

Did OpenAI have the for-profit element at that time?

10sapphire2h

A serious effective altruism movement with clean house. Everyone who pushed the 'work with AI capabilities company' line should retire or be forced to retire. There is no need to blame anyone for mistakes, the decision makers had reasons. But they chose wrong and should not continue to be leaders.

15mesaoptimizer3h

I mean, if Paul doesn't confirm that he is not under any non-disparagement obligations to OpenAI like Cullen O' Keefe did, we have our answer. In fact, given this asymmetry of information situation, it makes sense to assume that Paul is under such an obligation until he claims otherwise.

Rafael Harth's Shortform

Rafael Harth

Ω 24y

Rafael Harth1h30

From my perspective, the only thing that keeps the OpenAI situation from being all kinds of terrible is that I continue to think they're not close to human-level AGI, so it probably doesn't matter all that much.

This is also my take on AI doom in general; my P(doom|AGI soon) is quite high (>50% for sure), but my P(AGI soon) is low. In fact it decreased in the last 12 months.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

TurnTrout's shortform feed

TurnTrout

Ω 105y

Algon1h20

Because future rewards are discounted

Don't you mean future values? Also, AFAICT, the only thing going on here that seperates online from offline RL is that offline RL algorithms shape the initial value function to give conservative behaviour. And so you get conservative behaviour.

romeostevensit's Shortform

romeostevensit

Ω 15y

2Lorxus16h

Surely so! Hit me up if you ever end doing this - I'm likely getting the Lumina treatment in a couple months.

1wassname6h

A before and after would be even better!

Lorxus2h10

Any recommendations on how I should do that? You may assume that I know what a gas chromatograph is and what a Petri dish is and why you might want to use either or both of those for data collection, but not that I have any idea of how to most cost-effectively access either one as some rando who doesn't even have a MA in Chemistry.

9kave17h

I think Romoeo is thinking of checking a bunch of mediators of risk (like aldehyde levels) as well as of function (like whether the organism stays colonised)

LLMs could be as conscious as human emulations, potentially

weightt an

18d

Firstly, I'm assuming that high resolution human brain emulation that you can run on a computer is conscious in normal sense that we use in conversations. Like, it talks, has memories, makes new memories, have friends and hobbies and likes and dislikes and stuff. Just like a human that you could talk with only through videoconference type thing on a computer, but without actual meaty human on the other end. It would be VERY weird if this emulation exhibited all these human qualities for other reason than meaty humans exhibit them. Like, very extremely what the fuck surprising. Do you agree?

So, we now have deterministic human file on our hands.

Then, you can trivially make transformer like next token predictor out of human emulation. You just have emulation,...

(See More – 699 more words)

weightt an2h10

Each of the transformation steps described in the post reduces my expectation that the result would be conscious somewhat.

Well, it's like saying if the {human in a car as a single system} is or is not conscious. Firstly it's a weird question, because of course it is. And even if you chain the human to a wheel in such a way they will never disjoin from the car.

What I did is constrained possible actions of the human emulation. Not severely, the human still can talk whatever, just with constant compute budget, time or iterative commutation steps. Kind of like... (read more)

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

LessOnline Festival