The document doesn't specify whether "deployment" includes internal deployment. (This is important because maybe lots of risk comes from the lab using AIs internally to do AI development.)

This seems like such an obvious and crucial distinction that I felt very surprised when the framework didn't disambiguate between the two.

Reply

simeon_c's Shortform

habryka11h106

Yeah, at the time I didn't know how shady some of the contracts here were. I do think funding a legal defense is a marginally better use of funds (though my guess is funding both is worth it).

Reply

Is there a place to find the most cited LW articles of all time?

Answer by habrykaMay 17, 202460

We don't have a live count, but we have a one-time analysis from late 2023: https://www.lesswrong.com/posts/WYqixmisE6dQjHPT8/2022-and-all-time-posts-by-pingback-count

My guess is not much has changed since then, so I think that's basically the answer.

Reply

1

Is there a place to find the most cited LW articles of all time?

habryka1d20

What do you mean by "cited"? Do you mean "articles references in other articles on LW" or "articles cited in academic journals" or some other definition?

Reply

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

habryka1d77

I am quite interested in takes from various people in alignment on this agenda. I've engaged with both Davidad's and Bengio's stuff a bunch in the last few months, and I feel pretty confused (and skeptical) about a bunch of it, and would be interested in reading more of what other people have to say.

Reply

MATS Winter 2023-24 Retrospective

habryka3d95

This indicates that our scaling lab mentors were more discerning of value alignment on average than non-scaling lab mentors, or had a higher base rate of low-value alignment scholars (probably both).

The second hypothesis here seems much more likely (and my guess is your mentors would agree). My guess is after properly controlling for that you would find a mild to moderate negative correlation here.

But also, more importantly, the set of scholars from which MATS is drawing is heavily skewed towards the kind of person who would work at scaling labs (especially since funding has been heavily skewing towards funding the kind of research that can occur at scaling labs).

Reply

MATS Winter 2023-24 Retrospective

habryka3d1312

implicit framing of the average scaling lab safety researcher we support as being relatively unconcerned about value alignment or the positive impact of their research

Huh, not sure where you are picking this up. I am of course very concerned about the ability of researchers at scaling labs being capable of evaluating their positive impact in respect to their choice of working at a scaling lab (their job does after all depend on them not believing that is harmful), but of course they are not unconcerned about their positive impact.

Reply

1

MATS Winter 2023-24 Retrospective

habryka3d1712

In Winter 2023-24, our most empirical research dominated cohort, mentors rated the median scholar's value alignment at 8/10 and 85% of scholars were rated 6/10 or above, where 5/10 was “Motivated in part, but would potentially switch focus entirely if it became too personally inconvenient.”

Wait, aren't many of those mentors themselves working at scaling labs or working very closely with them? So this doesn't feel like a very comforting response to the concern of "I am worried these people want to work at scaling labs because it's a high-prestige and career-advancing thing to do", if the people whose judgements you are using to evaluate have themselves chosen the exact path that I am concerned about.

Reply

Ilya Sutskever and Jan Leike resign from OpenAI

habryka3d6031

Cade Metz was the NYT journalist who doxxed Scott Alexander. IMO he has also displayed a somewhat questionable understanding of journalistic competence and integrity, and seems to be quite into narrativizing things in a weirdly adversarial way (I don't think it's obvious how this applies to this article, but it seems useful to know when modeling the trustworthiness of the article).

Reply

introduction to cancer vaccines

habryka3d32

Promoted to curated: Cancer vaccines are cool. I didn't quite realize how cool they were before this post, and this post is a quite accessible intro into them.

Reply