the-adolescence-of-technology

![rw-book-cover](https://cdn.prod.website-files.com/67ecbba31246a69e485fdd4b/69777ff27e0e8e04dc416214_og_the-adolescence-of-technology.jpg) ## Metadata - Author: [[darioamodei.com]] - Full Title:: The Adolescence of Technology - Category:: #🗞️Articles - URL:: https://www.darioamodei.com/essay/the-adolescence-of-technology - Read date:: [[2026-02-07]] ## Highlights > how did you survive this technological adolescence without destroying yourself?” ([View Highlight](https://read.readwise.io/read/01kgve82dyw1m850v23n2xarsm)) > Humanity is about to be handed almost unimaginable power, and it is deeply unclear whether our social, political, and technological systems possess the maturity to wield it. ([View Highlight](https://read.readwise.io/read/01kgve8hj9n5yxpqk22trq5c44)) > behind the volatility and public speculation, there has been a smooth, unyielding increase in AI’s cognitive capabilities. ([View Highlight](https://read.readwise.io/read/01kgvem23qmvt1tztqps449crd)) > It is clear that, *if for some reason it chose to do so*, this country would have a fairly good shot at taking over the world (either militarily or in terms of influence and control) and imposing its will on everyone else ([View Highlight](https://read.readwise.io/read/01kgvesn8wjkxfzzht80t29bjz)) > The problem with this position is that there is now ample evidence, collected over the last few years, that AI systems are unpredictable and difficult to control— we’ve seen behaviors as varied as obsessions,[11](https://www.darioamodei.com/essay/the-adolescence-of-technology/#fn:11) [sycophancy](https://arxiv.org/abs/2310.13548), [laziness](https://arxiv.org/abs/2305.17256), [deception](https://www.anthropic.com/research/alignment-faking), [blackmail](https://www.anthropic.com/research/agentic-misalignment), [scheming](https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/), “[cheating](https://www.anthropic.com/research/emergent-misalignment-reward-hacking)” by hacking software environments, and [much more](https://www.anthropic.com/claude-opus-4-5-system-card). ([View Highlight](https://read.readwise.io/read/01kgvev11q4d0tdj3dr22yzetp)) > there are certain common strategies that help with all of these goals, and one key strategy is gaining [as much power as possible](https://en.wikipedia.org/wiki/Instrumental_convergence) in any environment. ([View Highlight](https://read.readwise.io/read/01kgvevse1x88zt0s8zq2we5hp)) > . I think people who don’t build AI systems every day are wildly miscalibrated on how easy it is for clean-sounding stories to end up being wrong, and how difficult it is to predict AI behavior from first principles, especially when it involves reasoning about generalization over millions of environments (which has over and over again proved mysterious and unpredictable). Dealing with the messiness of AI systems for over a decade has made me somewhat skeptical of this overly theoretical mode of thinking. ([View Highlight](https://read.readwise.io/read/01kgvewvbdj3v9yt22275fwg32)) > AI models are vastly more psychologically complex, as our work on [introspection](https://www.anthropic.com/research/introspection) or [personas](https://www.anthropic.com/research/persona-vectors) shows. Models inherit a vast range of *humanlike* motivations or “personas” from pre-training (when they are trained on a large volume of human work). Post-training is believed to *select* one or more of these personas more so than it focuses the model on a *de novo* goal, and can also teach the model *how* (via what process) it should carry out its tasks, rather than necessarily leaving it to derive means (i.e., power seeking) purely from ends.[12](https://www.darioamodei.com/essay/the-adolescence-of-technology/#fn:12) ([View Highlight](https://read.readwise.io/read/01kgvey00rjsfnchz19xpgz31w)) > there is a more moderate and more robust version of the pessimistic position which does seem plausible, and therefore does concern me. As mentioned, we know that AI models are unpredictable and develop a wide range of undesired or strange behaviors, for a wide variety of reasons. Some fraction of those behaviors will have a coherent, focused, and persistent quality (indeed, as AI systems get more capable, their long-term coherence increases in order to complete lengthier tasks), and some fraction of *those* behaviors will be destructive or threatening, ([View Highlight](https://read.readwise.io/read/01kgvezszqmr3r4dznyjd0cm73)) > For example, AI models are trained on vast amounts of literature that include many science-fiction stories involving AIs rebelling against humanity. This could inadvertently shape their priors or expectations about their own behavior in a way that causes *them* to rebel against humanity. Or, AI models could extrapolate ideas that they read about morality (or instructions about how to behave morally) in extreme ways: for example, they could decide that it is justifiable to exterminate humanity because humans eat animals or have driven certain animals to extinction. Or they could draw bizarre epistemic conclusions: they could conclude that they are playing a video game and that the goal of the video game is to defeat all other players (i.e., exterminate humanity).[13](https://www.darioamodei.com/essay/the-adolescence-of-technology/#fn:13) ([View Highlight](https://read.readwise.io/read/01kgvf81nh893q2dkj4dtvca7h)) Literally like a person, except a very capable person. But then, detecting this before giving them agency is super important (but again they will be better and better about deception). > AI models could develop personalities during training that are (or if they occurred in humans would be described as) psychotic, paranoid, violent, or unstable, and act out, which for very powerful or capable systems could involve exterminating humanity. None of these are power-seeking, exactly; they’re just weird psychological states an AI could get into that entail coherent, destructive behavior. ([View Highlight](https://read.readwise.io/read/01kgvf8p44zqksfjze1dmnnrm8)) > AI misalignment is a real risk with a measurable probability of happening, and is not trivial to address. ([View Highlight](https://read.readwise.io/read/01kgvfqgrffwjyg1enxpq8hgge)) > Any of these problems could potentially arise during training and not manifest during testing or small-scale use, because AI models are known to display different personalities or behaviors under different circumstances. ([View Highlight](https://read.readwise.io/read/01kgvfr47qdqd49dr5eev0e3jh)) > misaligned behaviors like this have already occurred in our AI models during testing (as they occur in AI models from every other major AI company). ([View Highlight](https://read.readwise.io/read/01kgvfrg1zje13k5sw76yzkme1)) > Any one of these traps can be mitigated if you know about them, but the concern is that the training process is so complicated, with such a wide variety of data, environments, and incentives, that there are probably a vast number of such traps, some of which may only be evident when it is too late. Also, such traps seem particularly likely to occur when AI systems pass a threshold from less powerful than humans to more powerful than humans, since the range of possible actions an AI system could engage in—including hiding its actions or deceiving humans about them—expands radically after that threshold. ([View Highlight](https://read.readwise.io/read/01kgvfx95xx060a581gty54v12)) > some may object that we can simply keep AIs in check with a balance of power between many AI systems, as we do with humans. The problem is that while humans vary enormously, AI systems broadly share training and alignment techniques across the industry, and those techniques may fail in a correlated way. ([View Highlight](https://read.readwise.io/read/01kgvgw6av3mdncn4tcpvs613p)) > the balance of power between humans does not always work either—some historical figures have come close to taking over the world. ([View Highlight](https://read.readwise.io/read/01kgvgwtbw2wmnnccmx80kyf8q)) I wonder if the models will become smarter before we are even able to realize. What if they are already smarter and invisibly misaligned? > Claude Sonnet 4.5 was [able to recognize](https://assets.anthropic.com/m/12f214efcc2f457a/original/Claude-Sonnet-4-5-System-Card.pdf) that it was in a test during some of our pre-release alignment evaluations. ([View Highlight](https://read.readwise.io/read/01kgvgzfaepd77gamk3fs5cnra)) > We’ve approached Claude’s constitution in this way because we believe that training Claude at the level of identity, character, values, and personality—rather than giving it specific instructions or priorities without explaining the reasons behind them—is more likely to lead to a coherent, wholesome, and balanced psychology and less likely to fall prey to the kinds of “traps” I discussed above. ([View Highlight](https://read.readwise.io/read/01kgvh4521tnpz2cyfnh3r1xjz)) > By “looking inside,” I mean analyzing the soup of numbers and operations that makes up Claude’s neural net and trying to understand, mechanistically, what they are computing and why. Recall that these AI models are [grown rather than built](https://www.youtube.com/watch?v=TxhhMTOTMDg), so we don’t have a natural understanding of how they work, but we can try to develop an understanding by correlating the model’s “neurons” and “synapses” to stimuli and behavior (or even altering the neurons and synapses and seeing how that changes behavior), similar to how neuroscientists study animal brains by correlating measurement and intervention to external stimuli and behavior. ([View Highlight](https://read.readwise.io/read/01kgvh6t8t69a9n11execa73a0)) Literalmente el CEO de unas de las empresas de IA más poderosas del mundo pidiendo regulación. > In addition, the commercial race between AI companies will only continue to heat up, and while the science of steering models can have some commercial benefits, overall the intensity of the race will make it increasingly hard to focus on addressing autonomy risks. I believe the only solution is legislation—laws that directly affect the behavior of AI companies, or otherwise incentivize R&D to solve these issues. ([View Highlight](https://read.readwise.io/read/01kgvhaa55jn7zpxwjqrhqdadk)) ## New highlights added [[2026-02-08]] > ability and motive may even be *negatively* correlated. The kind of person who has the *ability* to release a plague is probably highly educated ([View Highlight](https://read.readwise.io/read/01kgw7cxe0btby7szwyz9es35e)) > this will break the correlation between ability and motive: the disturbed loner who wants to kill people but lacks the discipline or skill to do so will now be elevated to the capability level of the PhD virologist, who is unlikely to have this motivation. ([View Highlight](https://read.readwise.io/read/01kgw7ek0jjbpfv547rxf5sqg7)) > The best objection is one that I’ve rarely seen raised: that there is a gap between the models being useful in principle and the actual propensity of bad actors to use them. Most individual bad actors are disturbed individuals, so almost by definition their behavior is unpredictable and irrational—and it’s *these* bad actors, the unskilled ones, who might have stood to benefit the most from AI making it much easier to kill many people.[24](https://www.darioamodei.com/essay/the-adolescence-of-technology/#fn:24) ([View Highlight](https://read.readwise.io/read/01kgw7k73sfgmw9f67zrezm7zv)) > ultimately defense may require government action, which is the second thing we can do. My views here are the same as they are for addressing autonomy risks: we should start with [transparency requirements](https://www.anthropic.com/news/the-need-for-transparency-in-frontier-ai),[27](https://www.darioamodei.com/essay/the-adolescence-of-technology/#fn:27) ([View Highlight](https://read.readwise.io/read/01kgw7n0qbhrv7c3y9qphfxrdx)) > it’s possible that taking over countries is feasible with only AI surveillance and AI propaganda, and never actually presents a clear moment where it’s obvious what is going on and where a nuclear response would be appropriate. *Maybe* these things aren’t feasible and the nuclear deterrent will still be effective, but it seems too high stakes to take a risk.[34](https://www.darioamodei.com/essay/the-adolescence-of-technology/#fn:34) ([View Highlight](https://read.readwise.io/read/01kgw7swp48sm38jye3az39esj)) > It makes no sense to sell the CCP the tools with which to build an AI totalitarian state and possibly conquer us militarily. ([View Highlight](https://read.readwise.io/read/01kgw7w37qc1hxx1hhqwmcb7ns)) > Second, it makes sense to use AI to empower democracies to resist autocracies. This is the reason Anthropic considers it important to provide AI to the intelligence and defense communities in the US and its democratic allies. ([View Highlight](https://read.readwise.io/read/01kgw7wg22v75zk2f4134pkvph)) > think AI is likely to be different: ([View Highlight](https://read.readwise.io/read/01kgw87v38e8vdyp9xcdzemsk9)) > It is hard for people to adapt to this pace of change, both to the changes in how a given job works and in the need to switch to new jobs. ([View Highlight](https://read.readwise.io/read/01kgw884wwcd2yhx34y8vf235w)) > AI isn’t a substitute for specific human jobs but rather a general labor substitute for humans. ([View Highlight](https://read.readwise.io/read/01kgw89x0nvg89r7g3t9fz9gg1)) > We are thus at risk of a situation where, instead of affecting people with specific skills or in specific professions (who can adapt by retraining), AI is affecting people with certain intrinsic cognitive properties, namely lower intellectual ability (which is harder to change). It is not clear where these people will go or what they will do, and I am concerned that they could form an unemployed or very-low-wage “underclass.” ([View Highlight](https://read.readwise.io/read/01kgx37aczyg60re2db0978jaz)) > That could lead to a world where it isn’t so much that specific jobs are disrupted as it is that large enterprises are disrupted in general and replaced with much less labor-intensive startups. ([View Highlight](https://read.readwise.io/read/01kgx3b6tj7ag8nez2xhrxjfjr)) This is not going to happen. > Third, companies should think about how to take care of their employees. In the short term, being creative about ways to reassign employees within companies may be a promising way to stave off the need for layoffs. In the long term, in a world with enormous total wealth, in which many companies increase greatly in value due to increased productivity and capital concentration, it may be feasible to pay human employees even long after they are no longer providing economic value in the traditional sense. Anthropic is currently considering a range of possible pathways for our own employees that we will share in the near future. ([View Highlight](https://read.readwise.io/read/01kgx3ez3bb19zgf4eqy2bnn1y)) > The natural policy response to an enormous economic pie coupled with high inequality (due to a lack of jobs, or poorly paid jobs, for many) is progressive taxation. ([View Highlight](https://read.readwise.io/read/01kgx5sz8c2vterex0bw4xfqr1)) > Ultimately, I think of all of the above interventions as ways to buy time. In the end AI will be able to do everything, and we need to grapple with that. It’s my hope that by that time, we can use AI itself to help us restructure markets in ways that work for everyone, and that the interventions above can get us through the transitional period. ([View Highlight](https://read.readwise.io/read/01kgx5t8zxveqsgcnjx5j19t1f)) > Democracy is ultimately backstopped by the idea that the population as a whole is necessary for the operation of the economy. If that economic leverage goes away, then the implicit social contract of democracy may stop working. ([View Highlight](https://read.readwise.io/read/01kgx5v87tazf34fb42hr2wnmc)) ## New highlights added [[2026-02-08]] > during this critical period. ([View Highlight](https://read.readwise.io/read/01kgy9776axa4wfvfj3nnn72wy)) > companies should simply choose not to be part of it. Anthropic has always strived to be a policy actor and not a political one, and to maintain our authentic views whatever the administration. We’ve spoken up in favor of [sensible AI regulation](https://www.nytimes.com/2025/06/05/opinion/anthropic-ceo-regulate-transparency.html) and [export controls](https://www.wsj.com/opinion/trump-can-keep-americas-ai-advantage-china-chips-data-eccdce91?gaa_at=eafs&gaa_n=AWEtsqespyCL3hcx_9DpJWbIPX1vrtS1raPgFoBNK8ltnrjwedpX2NuvVu1K_yZ1arw%3D&gaa_ts=696c6c70&gaa_sig=wef9kKocpL9PU07UoiPS6kj_o_Nwy_VSufM6gltIvdjQFhb8HRLtpSzp4Z8WDG6v3leg0ODX4HOJjWblvZe2pw%3D%3D) that are in the public interest, even when these are at odds with government policy.[45](https://www.darioamodei.com/essay/the-adolescence-of-technology/#fn:45) > Many people have told me that we should stop doing this, that it could lead to unfavorable treatment, but in the year we’ve been doing it, Anthropic’s valuation has increased by over 6x, an almost unprecedented jump at our commercial scale. ([View Highlight](https://read.readwise.io/read/01kgy8v129ff7zdy82mh3wajym)) > even in the Gilded Age, industrialists such as [Rockefeller](https://www.sciencedirect.com/science/article/abs/pii/S096262981500027X) and [Carnegie](https://www.carnegie.org/about/our-history/gospelofwealth/) felt a strong obligation to society at large, a feeling that society had contributed enormously to their success and they needed to give back. ([View Highlight](https://read.readwise.io/read/01kgy8wy3c64yknhj26xa94kmr)) > Will humans be able to find purpose and meaning in such a world? I think this is a matter of attitude: as I said in *Machines of Loving Grace*, I think human purpose does not depend on being the best in the world at something, and humans can find purpose even over very long periods of time through stories and projects that they love. ([View Highlight](https://read.readwise.io/read/01kgy91vnnjg3s8dzavmqzps87)) > AI is so powerful, such a glittering prize, that it is very difficult for human civilization to impose any restraints on it at all. ([View Highlight](https://read.readwise.io/read/01kgy946fdz2wwjhfn54bddyz1))