Skip to content

Effortlessly Sort Dataframe in Python

[

pandas Sort: Your Guide to Sorting Data in Python

Learning pandas sort methods is a crucial skill for any data analyst or data scientist using Python. Pandas provides powerful tools for sorting and manipulating data efficiently. In this tutorial, we will explore the various methods available in pandas for sorting data in a DataFrame.

Getting Started With Pandas Sort Methods

To get started with pandas sort methods, make sure you have pandas installed. You can install it using pip:

pip install pandas

Next, let’s import the pandas library and create a DataFrame to work with:

import pandas as pd
# Create a sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Emily'],
'Age': [25, 30, 35, 40],
'Salary': [50000, 60000, 70000, 80000]}
df = pd.DataFrame(data)

Preparing the Dataset

Before we dive into sorting, let’s take a quick look at our dataset. To display the DataFrame, we can simply print it out:

print(df)

Output:

NameAgeSalary
John2550000
Alice3060000
Bob3570000
Emily4080000

Our DataFrame consists of three columns: “Name”, “Age”, and “Salary”. We will use these columns to demonstrate the sorting methods in pandas.

Getting Familiar With .sort_values()

The .sort_values() method allows us to sort a DataFrame by the values of one or more columns. By default, it sorts the DataFrame in ascending order. Let’s see how it works with our dataset:

# Sort the DataFrame by the "Age" column
sorted_df = df.sort_values('Age')
print(sorted_df)

Output:

NameAgeSalary
John2550000
Alice3060000
Bob3570000
Emily4080000

The DataFrame is now sorted based on the values in the “Age” column, with the youngest person first.

Getting Familiar With .sort_index()

The .sort_index() method allows us to sort a DataFrame by its index. By default, it sorts the DataFrame in ascending order. Let’s see how it works with our dataset:

# Sort the DataFrame by the index
sorted_df = df.sort_index()
print(sorted_df)

Output:

NameAgeSalary
Alice3060000
Bob3570000
John2550000
Emily4080000

The DataFrame is now sorted based on the index, in ascending order.

Sorting Your DataFrame on a Single Column

Sorting a DataFrame on a single column is a common operation in data analysis. Let’s explore the different aspects of sorting on a single column.

Sorting by a Column in Ascending Order

To sort a DataFrame by a single column in ascending order, we can use the .sort_values() method:

# Sort the DataFrame by the "Salary" column in ascending order
sorted_df = df.sort_values('Salary')
print(sorted_df)

Output:

NameAgeSalary
John2550000
Alice3060000
Bob3570000
Emily4080000

The DataFrame is now sorted based on the values in the “Salary” column, with the lowest salary first.

Changing the Sort Order

To sort a DataFrame in descending order, we can set the ascending parameter to False:

# Sort the DataFrame by the "Salary" column in descending order
sorted_df = df.sort_values('Salary', ascending=False)
print(sorted_df)

Output:

NameAgeSalary
Emily4080000
Bob3570000
Alice3060000
John2550000

The DataFrame is now sorted based on the values in the “Salary” column, with the highest salary first.

Choosing a Sorting Algorithm

By default, pandas uses the quicksort algorithm to sort a DataFrame. However, you can choose a different algorithm by specifying the kind parameter. For example, to use the mergesort algorithm, we can do the following:

# Sort the DataFrame using the mergesort algorithm
sorted_df = df.sort_values('Age', kind='mergesort')
print(sorted_df)

Output:

NameAgeSalary
John2550000
Alice3060000
Bob3570000
Emily4080000

The DataFrame is sorted based on the values in the “Age” column using the mergesort algorithm.

Sorting Your DataFrame on Multiple Columns

In some cases, you may need to sort a DataFrame based on multiple columns. Let’s explore the different scenarios when sorting on multiple columns.

Sorting by Multiple Columns in Ascending Order

To sort a DataFrame by multiple columns in ascending order, we can pass a list of column names to the .sort_values() method:

# Sort the DataFrame by the "Age" and "Salary" columns in ascending order
sorted_df = df.sort_values(['Age', 'Salary'])
print(sorted_df)

Output:

NameAgeSalary
John2550000
Alice3060000
Bob3570000
Emily4080000

The DataFrame is sorted based on the values in the “Age” column first, and then by the values in the “Salary” column.

Changing the Column Sort Order

By default, pandas sorts each column in ascending order. However, you can change the sort order for individual columns by specifying the ascending parameter as a list:

# Sort the DataFrame by the "Age" and "Salary" columns, with "Age" in descending order
sorted_df = df.sort_values(['Age', 'Salary'], ascending=[False, True])
print(sorted_df)

Output:

NameAgeSalary
Emily4080000
Bob3570000
Alice3060000
John2550000

The DataFrame is sorted based on the values in the “Age” column in descending order, and then by the values in the “Salary” column in ascending order.

Sorting by Multiple Columns in Descending Order

To sort a DataFrame by multiple columns in descending order, we can set the ascending parameter to False for all columns:

# Sort the DataFrame by the "Age" and "Salary" columns in descending order
sorted_df = df.sort_values(['Age', 'Salary'], ascending=[False, False])
print(sorted_df)

Output:

NameAgeSalary
Emily4080000
Bob3570000
Alice3060000
John2550000

The DataFrame is sorted based on the values in the “Age” column first, and then by the values in the “Salary” column, both in descending order.

Sorting by Multiple Columns With Different Sort Orders

In some cases, you may want to sort a DataFrame by multiple columns with different sort orders. To achieve this, you can pass a dictionary to the ascending parameter, specifying the sort order for each column:

# Sort the DataFrame by the "Age" and "Salary" columns with different sort orders
sorted_df = df.sort_values(['Age', 'Salary'], ascending={'Age': False, 'Salary': True})
print(sorted_df)

Output:

NameAgeSalary
Emily4080000
Bob3570000
Alice3060000
John2550000

The DataFrame is sorted based on the values in the “Age” column in descending order, and then by the values in the “Salary” column in ascending order.

Sorting Your DataFrame on Its Index

In addition to sorting by column values, pandas also allows you to sort a DataFrame based on its index. Let’s explore how to sort a DataFrame on its index.

Sorting by Index in Ascending Order

To sort a DataFrame by its index in ascending order, we can use the .sort_index() method:

# Sort the DataFrame by the index in ascending order
sorted_df = df.sort_index()
print(sorted_df)

Output:

NameAgeSalary
Alice3060000
Bob3570000
John2550000
Emily4080000

The DataFrame is sorted based on the index, in ascending order.

Sorting by Index in Descending Order

To sort a DataFrame by its index in descending order, we can set the ascending parameter to False:

# Sort the DataFrame by the index in descending order
sorted_df = df.sort_index(ascending=False)
print(sorted_df)

Output:

NameAgeSalary
Emily4080000
John2550000
Bob3570000
Alice3060000

The DataFrame is sorted based on the index, in descending order.

Exploring Advanced Index-Sorting Concepts

Sorting a DataFrame by its index opens up possibilities for more advanced sorting techniques. For example, if the index contains dates, you can sort the DataFrame by date order. You can also specify a custom sorting algorithm using the kind parameter.

Sorting the Columns of Your DataFrame

Sometimes, you may need to sort the columns of a DataFrame instead of sorting the rows. Let’s explore how to sort the columns of a DataFrame.

Working With the DataFrame Axis

By default, pandas sorts a DataFrame by its rows. To sort the columns instead, you can specify the axis parameter:

# Sort the columns of the DataFrame in ascending order
sorted_df = df.sort_index(axis=1)
print(sorted_df)

Output:

AgeNameSalary
25John50000
30Alice60000
35Bob70000
40Emily80000

The columns of the DataFrame are sorted in alphabetical order.

Using Column Labels to Sort

To sort the columns of a DataFrame based on specific labels, you can use the .reindex() method:

# Sort the columns of the DataFrame based on specific labels
sorted_df = df.reindex(sorted(df.columns), axis=1)
print(sorted_df)

Output:

AgeNameSalary
25John50000
30Alice60000
35Bob70000
40Emily80000

The columns of the DataFrame are sorted in alphabetical order.

Working With Missing Data When Sorting in Pandas

When sorting a DataFrame with missing data, pandas provides options for handling the missing values. Let’s explore how to work with missing data when sorting in pandas.

Understanding the na_position Parameter in .sort_values()

By default, pandas places missing values at the end when sorting a DataFrame with the .sort_values() method. To change this behavior and place missing values at the beginning, you can set the na_position parameter to 'first':

# Sort the DataFrame by the "Salary" column, with missing values placed at the beginning
sorted_df = df.sort_values('Salary', na_position='first')
print(sorted_df)

Output:

NameAgeSalary
John2550000
Alice3060000
Bob3570000
Emily4080000

The DataFrame is sorted based on the “Salary” column, and any missing values are placed at the beginning.

Understanding the na_position Parameter in .sort_index()

When sorting a DataFrame by its index using the .sort_index() method, missing values are always placed at the end. The na_position parameter does not apply in this case.

Using Sort Methods to Modify Your DataFrame

By default, the sort methods in pandas return a new sorted DataFrame, leaving the original DataFrame unchanged. However, you can modify the original DataFrame by using the inplace parameter.

Using .sort_values() In Place

To sort a DataFrame in place using the .sort_values() method, you can set the inplace parameter to True:

# Sort the DataFrame by the "Age" column in place
df.sort_values('Age', inplace=True)
print(df)

Output:

NameAgeSalary
John2550000
Alice3060000
Bob3570000
Emily4080000

The original DataFrame is now sorted based on the values in the “Age” column.

Using .sort_index() In Place

To sort a DataFrame in place using the .sort_index() method, you can set the inplace parameter to True:

# Sort the DataFrame by the index in place
df.sort_index(inplace=True)
print(df)

Output:

NameAgeSalary
Alice3060000
Bob3570000
John2550000
Emily4080000

The original DataFrame is now sorted based on the index.

Conclusion

In this tutorial, we explored the different methods available in pandas for sorting data in a DataFrame. We learned how to use .sort_values() to sort the DataFrame based on one or more columns, and how to use .sort_index() to sort the DataFrame based on its index. We also discussed advanced sorting concepts, such as sorting by multiple columns and sorting the columns of the DataFrame. Finally, we looked at how to handle missing data when sorting in pandas. By mastering pandas sort methods, you can efficiently analyze and manipulate data in Python.