Toxicity testing evaluates the safety of drugs and chemicals before they reach humans. It spans a range of methods, from rapid cell-based screens to long-term animal studies, all designed to identify harmful effects and establish safe exposure levels. This topic covers the major test types, underlying principles, predictive modeling, alternatives to animal testing, and the regulatory framework that ties it all together.

Types of toxicity tests

The type of toxicity test you choose depends on what you need to learn about a substance. Some tests look at immediate harm from a single dose; others track damage that builds over months or years. Regulatory agencies specify which tests are required based on the substance's intended use.

In vitro vs in vivo

In vitro tests are conducted outside a living organism, typically using cell cultures or isolated tissues.
- Examples: cell viability assays (MTT assay), organ-specific models (liver microsomes for metabolic toxicity)
In vivo tests are performed in whole, living organisms, usually rodents or other animals.
- Examples: acute toxicity studies ( $LD_{50}$ ), chronic toxicity studies (90-day repeated dose)
In vitro tests work well for initial screening because they're fast and inexpensive. In vivo studies are needed for a fuller picture because they capture systemic effects like immune responses and organ-to-organ interactions that cell cultures can't replicate.

Acute vs chronic toxicity

Acute toxicity measures harmful effects from a single exposure or multiple exposures within a short window (usually 24 hours to 14 days). The classic endpoint is the $LD_{50}$ , the dose that kills 50% of test animals, or the $LC_{50}$ for inhaled substances.

Chronic toxicity tracks adverse effects over an extended period, typically 10% or more of the organism's lifespan (e.g., 90 days to 2 years in rodents). These studies reveal long-term consequences like cumulative organ damage, carcinogenicity, and reproductive harm.

Together, acute and chronic data help establish safe exposure levels for humans and flag potential health risks at different timescales.

Genotoxicity and carcinogenicity

Genotoxicity tests assess whether a substance damages DNA or causes mutations.
- The Ames test (bacterial reverse mutation assay) is the most widely used first-line screen. It exposes specially engineered Salmonella strains to the test compound and checks whether mutations are induced that allow bacterial growth on histidine-free media.
- The in vitro micronucleus test detects chromosomal damage in mammalian cells by looking for small, extra nuclei that form when chromosomes break or fail to segregate properly.
Carcinogenicity studies evaluate a substance's potential to cause cancer, typically through 2-year rodent bioassays that monitor tumor incidence and type.
Positive genotoxicity results often trigger the need for full carcinogenicity studies, since DNA damage is a key early step in cancer development.

Reproductive and developmental toxicity

Reproductive toxicity tests examine effects on fertility, sexual function, and offspring development. A two-generation reproductive toxicity study, for example, exposes parent animals and then evaluates fertility and health outcomes in both the first and second generations.
Developmental toxicity tests focus on birth defects and adverse effects during embryo/fetal development. They target critical windows like implantation, organogenesis, and fetal growth.
These tests are especially important for any substance that pregnant women or people of reproductive age might be exposed to.

Immunotoxicity and allergenicity

Immunotoxicity tests evaluate whether a substance suppresses or inappropriately stimulates the immune system. Endpoints include changes in immune organ weights (thymus, spleen), histopathology of lymphoid tissues, and functional assays of immune cell activity.
Allergenicity tests determine whether a substance can trigger allergic reactions, particularly skin sensitization.
- The local lymph node assay (LLNA) measures lymphocyte proliferation in draining lymph nodes after dermal exposure.
- The guinea pig maximization test (GPMT) is an older but still-used method that looks for skin sensitization responses.
Both categories matter for any substance involving direct human contact.

Principles of toxicity testing

These principles form the scientific foundation for designing studies, interpreting results, and translating animal data into human risk estimates.

Dose-response relationships

The dose-response relationship is the backbone of toxicology. As the dose of a substance increases, the severity or frequency of adverse effects generally increases too. Plotting dose against response produces a dose-response curve, which is used to identify two critical reference points:

NOAEL (No-Observed-Adverse-Effect Level): the highest dose at which no significant adverse effect is seen
LOAEL (Lowest-Observed-Adverse-Effect Level): the lowest dose at which an adverse effect is first detected

These values feed directly into calculating safe human exposure limits.

Threshold and non-threshold effects

Threshold effects only appear above a certain dose. Below that dose, the body can detoxify or repair the damage without any observable harm. Most non-carcinogenic toxic effects (e.g., liver enzyme elevation, kidney damage) are considered to have a threshold.
Non-threshold effects are assumed to carry some risk at any dose, no matter how small. Genotoxic carcinogens are the classic example: because even a single DNA mutation could theoretically initiate cancer, regulators often assume there is no completely safe dose.

This distinction has major consequences for regulation. Threshold toxicants get reference doses based on the NOAEL plus safety factors. Non-threshold toxicants are regulated using models that estimate risk at very low doses.

Species differences and extrapolation

Animal data don't translate directly to humans. Differences in anatomy, physiology, and especially metabolic enzyme expression can make a substance more or less toxic in one species compared to another.

To account for this uncertainty, regulators apply uncertainty factors (also called safety factors) when extrapolating from animals to humans. A common default is a 10-fold factor for interspecies differences and another 10-fold factor for human variability, giving a combined 100-fold safety margin.

Toxicokinetics and toxicodynamics

These two concepts parallel pharmacokinetics and pharmacodynamics but focus on toxic substances:

Toxicokinetics describes what the body does to the toxicant: absorption, distribution, metabolism, and excretion (ADME). It determines how much of the active toxic species reaches the target tissue and for how long.
Toxicodynamics describes what the toxicant does to the body: the biochemical and physiological effects that produce harm.

Understanding both is essential for interpreting why a substance is toxic at certain doses, in certain species, or via certain routes of exposure.

Mechanisms of toxicity

Knowing how a substance causes harm at the molecular and cellular level deepens your understanding of dose-response curves and species differences. Common mechanisms include:

Oxidative stress: excess reactive oxygen species overwhelm cellular antioxidant defenses
DNA damage: direct alkylation, intercalation, or strand breaks
Enzyme inhibition: blocking critical metabolic or signaling enzymes
Receptor activation/blockade: inappropriate stimulation or suppression of receptor-mediated pathways

Mechanistic data also support the development of alternative (non-animal) testing methods, because if you know the molecular target, you can design an in vitro assay to measure it directly.

In vitro toxicity assays

In vitro assays let you screen large numbers of compounds quickly and cheaply. They're the first line of defense in identifying potentially toxic substances before committing to expensive and time-consuming animal studies.

In vitro vs in vivo , Drug Design Progress of In silico, In vitro and In vivo Researches - Open Access Pub

Cell viability and cytotoxicity

Cell viability assays measure the proportion of living cells after exposure:
- MTT assay: measures mitochondrial metabolic activity (living cells convert MTT to a purple formazan product)
- Neutral red uptake assay: measures lysosomal integrity
- LDH release assay: measures lactate dehydrogenase leaking from damaged cell membranes
Cytotoxicity assays can also distinguish between necrosis (uncontrolled cell death from acute injury) and apoptosis (programmed cell death), which matters because the mechanism of cell death can indicate different types of toxicity.

Organ-specific toxicity models

Organ-specific models use specialized cell lines or primary cells to mimic the toxicity profile of a particular organ:

Hepatotoxicity: primary hepatocytes or hepatic cell lines (the liver is the most common target organ because it's the primary site of drug metabolism)
Neurotoxicity: neuronal cell cultures
Cardiotoxicity: cardiomyocytes (often derived from iPSCs), used to detect effects like QT prolongation

These models can incorporate metabolic competence, meaning the cells can actually metabolize the test compound the way the organ would in vivo, giving more realistic results.

High-throughput screening approaches

High-throughput screening (HTS) uses automated systems to test thousands of compounds across multiple assays in a short time. Key features:

Miniaturized formats (e.g., 384-well or 1536-well plates)
Robotic liquid handling for precise, reproducible dosing
Automated readouts (fluorescence, luminescence, absorbance)

HTS is especially valuable early in drug development for flagging toxic compounds before significant resources are invested. It also generates large datasets that feed into structure-activity relationship (SAR) analysis.

Advantages and limitations of in vitro tests

Advantages:

Rapid, cost-effective screening of many compounds

Reduced animal use (supports the 3Rs)

Ability to isolate specific mechanisms and target organ effects

Limitations:

No complex organ-organ interactions or systemic responses

Missing systemic factors like immune regulation and hormonal signaling

Metabolism and bioavailability may differ from in vivo conditions

The bottom line: in vitro tests are powerful for screening and mechanistic studies, but they can't fully replace in vivo data for comprehensive safety assessment.

In vivo toxicity studies

In vivo studies remain the gold standard for comprehensive toxicity evaluation because they capture the full complexity of a living system, including metabolism, immune responses, and multi-organ interactions.

Animal models for toxicity testing

Rodents (mice, rats): the most commonly used models due to small size, short lifespan, well-characterized genetics, and relatively low cost
Rabbits: preferred for dermal and ocular irritation/corrosion testing
Dogs and non-human primates: reserved for advanced studies, particularly for pharmaceuticals where rodent data may not adequately predict human responses

The choice of species depends on the toxicity endpoints being studied, the route of exposure, and regulatory requirements. Regulators typically require data from at least two species (one rodent, one non-rodent) for pharmaceutical development.

Dosing and exposure routes

Studies involve repeated dosing over defined periods (28 days, 90 days, or up to 2 years for carcinogenicity). The exposure route should match how humans would encounter the substance:

Oral (gavage or mixed into diet): mimics ingestion
Dermal: assesses toxicity from skin contact
Inhalation: evaluates airborne exposure
Intravenous or intraperitoneal injection: used when systemic exposure needs to be controlled precisely

Most studies test at least three dose levels plus a control group, allowing construction of a dose-response curve.

Clinical observations and pathology

Throughout the study, animals are monitored for clinical signs of toxicity: changes in behavior, food consumption, body weight, and physical appearance.

At study termination, detailed pathological examinations are performed:

Gross necropsy: visual inspection of all major organs and tissues for abnormalities (masses, discoloration, size changes)
Histopathology: microscopic examination of tissue sections to identify cellular changes, inflammation, necrosis, or neoplastic lesions

These examinations identify target organs of toxicity and characterize the nature and severity of the damage.

Biomarkers of toxicity

Biomarkers are measurable indicators that signal biological or pathological changes. In toxicity studies, commonly monitored biomarkers include:

Clinical chemistry: liver enzymes (ALT, AST) for hepatotoxicity, creatinine and BUN for nephrotoxicity
Hematology: complete blood counts, coagulation parameters
Urinalysis: protein, glucose, or blood in urine (indicators of kidney damage)
Organ weights: absolute and organ-to-body-weight ratios (e.g., an enlarged liver may indicate adaptive or toxic responses)

Changes in biomarkers often appear before overt clinical signs, making them valuable for early detection of toxicity.

Regulatory requirements for in vivo studies

Regulatory agencies require in vivo toxicity data before approving drugs, pesticides, and industrial chemicals. Standardized protocols come from organizations like:

ICH (International Council for Harmonisation): guidelines for pharmaceutical safety testing (e.g., ICH S1 for carcinogenicity, ICH S5 for reproductive toxicity)
OECD (Organisation for Economic Co-operation and Development): test guidelines for chemicals (e.g., OECD TG 407 for 28-day repeated dose oral toxicity)

These guidelines ensure studies are scientifically robust, reproducible across laboratories, and acceptable to regulatory agencies worldwide. Compliance is non-negotiable for product approval.

Toxicity prediction and modeling

Computational and modeling approaches estimate a substance's toxicity based on its chemical structure or physicochemical properties. They help prioritize which compounds need actual testing and can reduce the number of animals used.

Structure-activity relationships (SAR)

SAR analysis links specific structural features or functional groups to toxic effects. For example, aromatic amines are associated with mutagenicity, and certain Michael acceptors are flagged for reactivity-based toxicity. By identifying these "structural alerts," you can predict whether a new compound is likely to be toxic and design safer alternatives by modifying the problematic features.

In vitro vs in vivo , Frontiers | In vitro Alternatives to Acute Inhalation Toxicity Studies in Animal Models—A ...

Quantitative structure-activity relationships (QSAR)

QSAR takes SAR a step further by building mathematical models that quantitatively relate molecular descriptors (e.g., log P, molecular weight, hydrogen bond donors) to toxicity endpoints. The process:

Assemble a training set of compounds with known toxicity data
Calculate molecular descriptors for each compound
Build a statistical model (regression, classification) relating descriptors to toxicity
Validate the model with an independent test set
Use the validated model to predict toxicity of untested compounds

QSAR models are widely used by regulatory agencies (e.g., REACH in the EU) to fill data gaps when experimental data are unavailable.

Physiologically based pharmacokinetic (PBPK) modeling

PBPK models simulate ADME processes using a mathematical representation of the body as interconnected tissue compartments. Each compartment has physiological parameters (organ volume, blood flow rate) and chemical-specific parameters (partition coefficients, metabolic rates).

These models are particularly useful for:

Predicting tissue-level concentrations of a toxicant over time
Extrapolating doses across species (e.g., rat to human)
Estimating internal exposure from different routes and dosing regimens

In silico toxicity prediction tools

In silico tools are computer-based methods for estimating toxicity without any wet-lab work:

Expert systems: rule-based platforms encoding known toxicity-structure relationships (e.g., Derek Nexus, Toxtree)
Read-across: predicting a compound's toxicity from data on structurally similar compounds with known profiles
Machine learning models: algorithms trained on large toxicity databases to predict endpoints like mutagenicity, hepatotoxicity, or hERG channel inhibition

These tools are fastest and cheapest for initial hazard screening, though their predictions always carry uncertainty and typically need experimental confirmation.

Alternatives to animal testing

The push to replace, reduce, and refine animal use (the 3Rs) has driven development of increasingly sophisticated non-animal methods. These alternatives aim to provide human-relevant data while improving efficiency and animal welfare.

3D cell culture and organoids

Traditional 2D cell cultures grow in flat monolayers, which poorly represent real tissue architecture. 3D cell culture systems address this:

Spheroids: self-assembled cell aggregates that develop gradients of oxygen and nutrients similar to tumors or tissue cores
Hydrogels and scaffolds: provide extracellular matrix-like support for more realistic cell-cell and cell-matrix interactions
Organoids: self-organizing 3D structures derived from stem cells that recapitulate key features of the parent organ (e.g., liver organoids with bile duct-like structures, brain organoids with layered cortical architecture)

These systems produce more physiologically relevant toxicity data than 2D cultures.

Stem cell-derived models

Human stem cells (embryonic stem cells or induced pluripotent stem cells, iPSCs) can be differentiated into virtually any cell type. This offers several advantages for toxicity testing:

Human-relevant data without species extrapolation
The ability to generate patient-specific cells, capturing individual genetic variability in toxicant response
Applications in organ-specific toxicity (e.g., iPSC-derived cardiomyocytes for cardiotoxicity), developmental toxicity, and genetic susceptibility studies

Organ-on-a-chip systems

Organ-on-a-chip devices are microfluidic platforms containing miniaturized 3D cell cultures that simulate the dynamic environment of a living organ. They can incorporate:

Continuous fluid flow mimicking blood circulation
Mechanical forces like breathing motions (lung-on-a-chip) or peristalsis (gut-on-a-chip)
Multiple organ compartments connected in series ("body-on-a-chip") to model systemic toxicity and organ-organ crosstalk

These systems bridge the gap between simple cell cultures and whole-animal studies, offering a more realistic platform with significantly reduced animal use.

Read-across and weight-of-evidence approaches

Read-across predicts a substance's toxicity using data from structurally similar compounds, based on the principle that similar structures produce similar biological effects. It's widely used under REACH (EU chemical regulation) to fill data gaps.
Weight-of-evidence integrates data from multiple sources (in vitro assays, in silico predictions, existing animal data, epidemiological studies) to build a comprehensive toxicity profile. The strength of the conclusion depends on the quality, consistency, and relevance of all available evidence.

Both approaches reduce the need for new animal studies and support integrated testing strategies.

Regulatory aspects of toxicity testing

Regulatory frameworks ensure that toxicity testing is conducted consistently and that the resulting data are sufficient to protect public health and the environment.

Guidelines and standards for toxicity testing

The OECD and ICH are the two most important international bodies setting toxicity testing standards. OECD test guidelines cover chemicals broadly, while ICH guidelines focus specifically on pharmaceuticals. Adherence to these standardized protocols ensures that data generated in one country are accepted by regulators in another, which is critical for global drug development and chemical trade.

Safety assessment for drug development

Toxicity testing is woven into every stage of drug development. Before a new drug enters human clinical trials, it must complete a battery of non-clinical safety studies:

Genotoxicity screening (Ames test, micronucleus test)
Acute and repeat-dose toxicity in two species
Safety pharmacology (cardiovascular, respiratory, CNS effects)
Reproductive and developmental toxicity (if relevant population will be exposed)
Carcinogenicity studies (for drugs intended for long-term use)

Regulatory agencies like the FDA (US) and EMA (EU) review this data package before granting approval. Clinical safety monitoring continues after approval through adverse event reporting and post-marketing surveillance.

Toxicity testing for environmental chemicals

Environmental chemicals (pesticides, industrial chemicals, consumer products) undergo their own regulatory toxicity assessments. The EPA (US) and ECHA (EU, under the REACH regulation) require manufacturers to submit toxicity data covering relevant endpoints and exposure scenarios. OECD test guidelines provide the standardized methods, and the resulting data inform risk management decisions like exposure limits, labeling requirements, and use restrictions.

Ethical considerations in toxicity testing

Animal testing raises significant ethical concerns. The 3Rs principle guides ethical practice:

Replacement: use non-animal methods whenever scientifically valid alternatives exist
Reduction: minimize the number of animals used per study through better experimental design and statistical methods
Refinement: modify procedures to minimize pain, suffering, and distress

Institutional animal ethics committees (IACECs) or animal welfare committees review and approve all animal study protocols before they begin. Regulatory agencies increasingly accept and encourage validated alternative methods, and the long-term goal across the field is to move toward animal-free testing wherever possible.

2,589 studying →