
## Metadata
- Author: [[Yufei Guo; Muzhe Guo; Juntao Su; Zhou Yang; Mengqiu Zhu; Hongfei Li; Mengyang Qiu; Shuo Shuo Liu]]
- Full Title:: Bias in Large Language Models: Origin, Evaluation, and Mitigation
- Category:: #🗞️Articles
- URL:: https://readwise.io/reader/document_raw_content/463263791
- Read date:: [[2026-06-07]]
## Highlights
> LLMs are deployed in critical decision-making environments such as healthcare diagnostics (Rajkomar et al., 2018), legal judgments (Angwin et al., 2016), and hiring processes (Chen et al., 2018) ([View Highlight](https://read.readwise.io/read/01kth2cp55n28tb4k15h3bch72))
> Intrinsic biases originate from the training data, as well as the architecture and underlying assumptions made during model design (Sun et al., 2019). ([View Highlight](https://read.readwise.io/read/01kth2fhfv792w2r790dqbecnh))
> Extrinsic biases, on the other hand, emerge during the application of LLMs in real-world tasks.
> These biases are often more subtle as they manifest in the model outputs during specific tasks, such as sentiment analysis, content moderation, or automated decision-making systems ([View Highlight](https://read.readwise.io/read/01kth2fydy35yg7rftca3k90qa))
> Language models, especially LLMs that are trained on a large corpus, usually have intrinsic bias issues, since the training corpora often contain societal biases that are built into the model (Pagano et al., 2023; Ray, 2023; Goldfarb-Tarrant, 2024; Goldman and Tsotsos, 2024). ([View Highlight](https://read.readwise.io/read/01kth2xd24hevs90q9qxd90jcc))
> over-representativeness and under-representativeness ([View Highlight](https://read.readwise.io/read/01kth30322h06zgjtspmkb8z5w))
> men might be over-represented in datasets about leadership or science, while women may be more frequently mentioned in caregiving roles (UNESCO and IRCAI, 2024), ([View Highlight](https://read.readwise.io/read/01kth30apwqx9bjr2t1fspqk16))
> Spatial and temporal bias: LLMs trained predominantly on a corpus from certain countries or geographic locations may absorb the cultural norms and values, hence building biases into the underlying LLMs. ([View Highlight](https://read.readwise.io/read/01kth30gv9sd5s62ts9c4b2qtc))
> it is nearly impossible to fully eliminate inappropriate content given the vast scale of the training corpus. ([View Highlight](https://read.readwise.io/read/01kth3a86msdj6wvqm4bxjetvj))
> For example, gender-neutral pronouns may be associated with one gender due to the patterns in the training dataset (Kotek et al., 2023; Dwivedi et al., 2023). ([View Highlight](https://read.readwise.io/read/01kth3bpw2n5gs5h0xk6t5r7x9))
> split less frequently occurring words into smaller units. This splitting policy can result in fragmented representations of under-represented entities, names, or terminology, particularly affecting minority languages or groups. ([View Highlight](https://read.readwise.io/read/01kth3dpw9wm721dh55kc6e28a))