Tutorials References Exercises Sign Up Menu
Create Website Get Certified Pro

Pandas DataFrame drop_duplicates() Method

❮ DataFrame Reference


Remove duplicate rows from  the DataFrame:

import pandas as pd

data = {
  "name": ["Sally", "Mary", "John", "Mary"],
  "age": [50, 40, 30, 40],
  "qualified": [True, False, False, False]

df = pd.DataFrame(data)

newdf = df.drop_duplicates()
Try it Yourself »

Definition and Usage

The drop_duplicates() method removes duplicate rows.

Use the subset parameter if only some specified columns should be considered when looking for duplicates.


dataframe.drop_duplicates(subset, keep, inplace, ignore_index)


The parameters are keyword arguments.

Parameter Value Description
subset column label(s) Optional. A String, or a list, containing the columns to use when looking for duplicates. If not specified, all columns are being used.
keep 'first'
Optional, default 'first'. Specifies which duplicate to keep. If False, drop ALL duplicates
inplace True
Optional, default False. If True: the removing is done on the current DataFrame. If False: returns a copy where the removing is done.
ignore_index True
Optional, default False. Specifies whether to label the 0, 1, 2 etc., or not

Return Value

A DataFrame with the result, or None if the inplace parameter is set to True.

❮ DataFrame Reference