AWS rolls out Row Zero to expand self-serve analytics

05.30.2025

EPA Chemical and Product Database (CPDat) in a Spreadsheet

Large DatasetsHealth and Medical
EPA Chemical and Product Database (CPDat) in a Spreadsheet
epa cpdat database in a spreadsheet

The EPA Chemical and Products Database (CPDat) contains information on the chemical ingredients of thousands of consumer and industrial products. We've imported the full CPDat database into Row Zero, an enterprise-grade spreadsheet for big data, to make it easy to analyze CPDat data. View the CPDat spreadsheet here or continue reading to learn more about the CPDat dataset.


Dataset Summary

The dataset includes chemical ingredients data on more than 300,000 commercial products, with data on more than 12,000 different chemical ingredients. Each product is classified to a category, family, and product type. Each chemical is classified with a CAS number, which is a unique identifier assigned by the Chemical Abstracts Service (CAS) and includes information about it's functional use (e.g. solvent), and composition within each product. The source for the dataset is the Chemical and Products Database (CPDat) from the Environmental Protection Agency (EPA). The CPDat database is a collection of information sourced from publicly available documents and EPA assessments.

The dataset includes the following sheets:

  • Summary Data - Contains pivot tables that summarize the raw data from the product_composition_data sheet to highlight data by product, category, and company.
  • product_composition_data - Contains the list of products and their chemical ingredients with one row for each chemical and product combination. Products have been curated to Product Use Categories (PUCs) and chemical composition information is curated to weight fractions, curated names, and curated ingredients' chemical identifier information.
  • functional_use_data - Contains the list of all chemicals in CPDat that have a functional use reported in their associated data document and includes the raw and curated chemical identifiers and functional use.
  • list_presence_data - Contains the list of all chemicals in CPDat database that were collected from a list presence document with their raw and curated chemical identifiers.
  • puc_vocabulary - Contains information about the Product Use Category (PUC) vocabulary of CPDat.
  • fc_vocabulary - Contains information about the Function Category (FC) vocabulary of CPDat.
  • lpk_vocabulary - Contains information about the List Presence Keyword (LPK) vocabulary of CPDat.
  • CPDat v4 Data Dictionary - Data dictionary for the CPDat database with descriptions for each field.
  • CPDat v4 File Information - Lists the files included in the CPDat download along with a description of each file.

Highlights from the data

The Summary Data sheet includes pivot tables that highlight interesting views of the CPDat dataset.

Number of Chemicals in Each Product

Looking at a pivot table of the count of chemicals by product, reveals that beauty products tend to have the most chemicals per product, with hair coloring, hair styling, and face cream products dominating the list of products with the most chemicals. Several beauty products contain more than 100 different chemicals.

products with the most chemicals

To view the full list of chemicals by product, go to the product_composition_data sheet and search for the specific product.

Number of Chemicals Used by Companies

This pivot table shows the number of distinct chemicals used by each company and the average number of chemicals per product by company.

chemical use by company

To view the full list of chemicals by a company, go to the product_composition_data sheet and filter the Organization column.

Products with the Most Chemicals

Summarizing by product type, makes it clear that beauty products have the most chemicals on average.

look up products with most chemicals

Most Commonly Used Chemicals in Products

Water is the most commonly used chemical across all products in the dataset. After water, the 5 most common chemicals in products are:

  1. Titanium dioxide
  2. Glycerol
  3. 2-Phenoxyethanol
  4. Xylenes
  5. Toluene
most common chemicals in products

You can also use the additional pivot tables and filters to lookup common chemicals by product type and/or company.

most common chemicals in lip gloss

Use Cases for this Dataset

Row Zero is a powerful spreadsheet built for big data, so you can easily open the CPDat database in a spreadsheet in Row Zero to explore the dataset, lookup products, and analyze the data. Here are a few common use cases:

  1. Lookup chemical ingredients in products to better understand chemical exposure, identify potential risks, and better understand chemicals used in everyday products.
  2. Identify products that contain chemicals that may cause allergic reactions or pose risks.
  3. Lookup CAS numbers (Chemical Abstract Service numbers) and functional use of specific chemicals.
  4. Evaluate impact to supply chains if there is a shortage in a chemical or disruption to supply. You can see what companies and products will be affected.
  5. Support research involving chemical risk assessment, exposure modeling, and chemical informatics.

Data Sources

The source for the dataset is the Chemical and Products Database (CPDat) from the U.S. Environmental Protection Agency (EPA). The CPDat database consolidates information from publicly available documents and EPA assessments. Data referenced here has been updated as of May 2025 with the most recent data available. You can download the raw dataset at the link above.


Frequently Asked Questions

The EPA Chemical and Products Database (CPDat) consolidates information on the use, exposure potential, and function of thousands of chemicals found in consumer and industrial products.

CAS stands for Chemical Abstract Service and is a unique identifier assigned to each chemical substance by the Chemical Abstracts Service. You may also see it referenced as CASRN which stands for Chemical Abstract Service Registry Number. The CPDat database lists chemicals with CAS numbers.

There are more than 12,000 unique chemicals used in more than 300,000 products in the CPDat database.

No, you will not be able to open the full CPDat in Excel, because the file is too large for Excel. Excel has a max row limit of 1,048,576 rows and the CPDat database contains more than 3 million rows of data. You'll need to open in a more powerful spreadsheet like Row Zero.

No, you will not be able to open the full CPDat dataset in Google Sheets. Google Sheets has a 100MB import limit and 10 million cell limit and the CDPat files exceed those limits. You'll need to open in a more powerful spreadsheet like Row Zero.

Keep exploring

Latest datasets

Explore all datasets

Get started with Row Zero

Ready to upgrade your spreadsheets?