Skip to content

Sort DataFrame by Column: Effortlessly Rearrange Data with Python

[

pandas Sort: Your Guide to Sorting Data in Python

Learning pandas sort methods is a great way to start with or practice doing basic data analysis using Python. Most commonly, data analysis is done with spreadsheets, SQL, or pandas. One of the great things about using pandas is that it can handle a large amount of data and offers highly performant data manipulation capabilities.

In this tutorial, you’ll learn how to use .sort_values() and .sort_index(), which will enable you to sort data efficiently in a DataFrame.

By the end of this tutorial, you’ll know how to:

  • Sort a pandas DataFrame by the values of one or more columns
  • Use the ascending parameter to change the sort order
  • Sort a DataFrame by its index using .sort_index()
  • Organize missing data while sorting values
  • Sort a DataFrame in place using inplace set to True

To follow along with this tutorial, you’ll need a basic understanding of pandas DataFrames and some familiarity with reading in data from files.

Getting Started With Pandas Sort Methods

As a quick reminder, a DataFrame is a two-dimensional labeled data structure in pandas, and it can be thought of as a table of data with rows and columns. Here’s an example of what a pandas DataFrame looks like:

NameAgeGender
John25Male
Sarah30Female
Michael28Male

To begin sorting a DataFrame, you must first prepare the dataset. Let’s start by importing pandas and creating a simple DataFrame:

import pandas as pd
data = {'Name': ['John', 'Sarah', 'Michael'],
'Age': [25, 30, 28],
'Gender': ['Male', 'Female', 'Male']}
df = pd.DataFrame(data)

In this example, we have a DataFrame with three columns: ‘Name’, ‘Age’, and ‘Gender’. We can now move on to learning about the different sorting methods in pandas.

Getting Familiar With .sort_values()

The .sort_values() method is used to sort the DataFrame by the values of one or more columns. By default, it sorts the DataFrame in ascending order. Let’s see an example:

df_sorted = df.sort_values(by='Age')
print(df_sorted)

Output:

Name Age Gender
0 John 25 Male
2 Michael 28 Male
1 Sarah 30 Female

In this example, we sorted the DataFrame by the ‘Age’ column in ascending order. The resulting DataFrame, df_sorted, is sorted based on the values in the ‘Age’ column.

Getting Familiar With .sort_index()

The .sort_index() method is used to sort the DataFrame by its index. By default, it sorts the DataFrame in ascending order of the index. Here’s an example:

df_sorted = df.sort_index()
print(df_sorted)

Output:

Name Age Gender
0 John 25 Male
1 Sarah 30 Female
2 Michael 28 Male

In this example, we sorted the DataFrame by its index in ascending order. The resulting DataFrame, df_sorted, is sorted based on the index values.

Sorting Your DataFrame on a Single Column

Now that you’re familiar with the .sort_values() and .sort_index() methods, let’s dive deeper into sorting a DataFrame on a single column.

Sorting by a Column in Ascending Order

To sort a DataFrame by a single column in ascending order, you can use the .sort_values() method. Let’s take a look at an example:

df_sorted = df.sort_values(by='Age')
print(df_sorted)

Output:

Name Age Gender
0 John 25 Male
2 Michael 28 Male
1 Sarah 30 Female

In this example, we sorted the DataFrame by the ‘Age’ column in ascending order. The resulting DataFrame, df_sorted, is sorted based on the values in the ‘Age’ column.

Changing the Sort Order

By default, the .sort_values() method sorts the DataFrame in ascending order. However, you can change the sort order to descending by setting the ascending parameter to False. Let’s see an example:

df_sorted = df.sort_values(by='Age', ascending=False)
print(df_sorted)

Output:

Name Age Gender
1 Sarah 30 Female
2 Michael 28 Male
0 John 25 Male

In this example, we sorted the DataFrame by the ‘Age’ column in descending order by setting the ascending parameter to False. The resulting DataFrame, df_sorted, is sorted based on the values in the ‘Age’ column in descending order.

Choosing a Sorting Algorithm

By default, pandas uses a stable sorting algorithm called quicksort. However, you can choose a different algorithm by setting the kind parameter. Let’s see an example:

df_sorted = df.sort_values(by='Age', kind='mergesort')
print(df_sorted)

Output:

Name Age Gender
0 John 25 Male
2 Michael 28 Male
1 Sarah 30 Female

In this example, we sorted the DataFrame by the ‘Age’ column using the mergesort algorithm. The resulting DataFrame, df_sorted, is sorted based on the values in the ‘Age’ column.

Sorting Your DataFrame on Multiple Columns

In addition to sorting on a single column, you can also sort a DataFrame on multiple columns. This can be useful when you want to prioritize the sorting based on multiple criteria.

Sorting by Multiple Columns in Ascending Order

To sort a DataFrame by multiple columns in ascending order, you can pass a list of column names to the .sort_values() method. Let’s take a look at an example:

df_sorted = df.sort_values(by=['Gender', 'Age'])
print(df_sorted)

Output:

Name Age Gender
1 Sarah 30 Female
0 John 25 Male
2 Michael 28 Male

In this example, we sorted the DataFrame first by the ‘Gender’ column and then by the ‘Age’ column in ascending order. The resulting DataFrame, df_sorted, is sorted based on the values in the ‘Gender’ column first, and then within each group, it’s sorted based on the values in the ‘Age’ column.

Changing the Column Sort Order

By default, each column is sorted in ascending order when sorting by multiple columns. However, you can change the sort order for each individual column by passing a list of boolean values to the ascending parameter. Let’s see an example:

df_sorted = df.sort_values(by=['Gender', 'Age'], ascending=[False, True])
print(df_sorted)

Output:

Name Age Gender
0 John 25 Male
2 Michael 28 Male
1 Sarah 30 Female

In this example, we sorted the DataFrame first by the ‘Gender’ column in descending order and then by the ‘Age’ column in ascending order. The resulting DataFrame, df_sorted, is sorted based on the values in the ‘Gender’ column in descending order first, and then within each group, it’s sorted based on the values in the ‘Age’ column in ascending order.

Sorting by Multiple Columns in Descending Order

To sort a DataFrame by multiple columns in descending order, you can pass a list of column names to the .sort_values() method and set the ascending parameter to False for all columns. Let’s take a look at an example:

df_sorted = df.sort_values(by=['Gender', 'Age'], ascending=[False, False])
print(df_sorted)

Output:

Name Age Gender
1 Sarah 30 Female
2 Michael 28 Male
0 John 25 Male

In this example, we sorted the DataFrame first by the ‘Gender’ column in descending order and then by the ‘Age’ column in descending order. The resulting DataFrame, df_sorted, is sorted based on the values in the ‘Gender’ column in descending order first, and then within each group, it’s sorted based on the values in the ‘Age’ column in descending order.

Sorting by Multiple Columns With Different Sort Orders

You can also sort a DataFrame by multiple columns with different sort orders. This can be useful when you want to prioritize the sorting of certain columns over others. Let’s see an example:

df_sorted = df.sort_values(by=['Gender', 'Age'], ascending=[False, True])
print(df_sorted)

Output:

Name Age Gender
0 John 25 Male
2 Michael 28 Male
1 Sarah 30 Female

In this example, we sorted the DataFrame first by the ‘Gender’ column in descending order and then by the ‘Age’ column in ascending order. The resulting DataFrame, df_sorted, is sorted based on the values in the ‘Gender’ column in descending order first, and then within each group, it’s sorted based on the values in the ‘Age’ column in ascending order.

Sorting Your DataFrame on Its Index

In addition to sorting a DataFrame by its columns, you can also sort it by its index. This can be useful when you want to arrange the rows based on the index values.

Sorting by Index in Ascending Order

To sort a DataFrame by its index in ascending order, you can use the .sort_index() method. Let’s see an example:

df_sorted = df.sort_index()
print(df_sorted)

Output:

Name Age Gender
0 John 25 Male
1 Sarah 30 Female
2 Michael 28 Male

In this example, we sorted the DataFrame by its index in ascending order. The resulting DataFrame, df_sorted, is sorted based on the index values.

Sorting by Index in Descending Order

To sort a DataFrame by its index in descending order, you can pass the ascending parameter to the .sort_index() method and set it to False. Let’s take a look at an example:

df_sorted = df.sort_index(ascending=False)
print(df_sorted)

Output:

Name Age Gender
2 Michael 28 Male
1 Sarah 30 Female
0 John 25 Male

In this example, we sorted the DataFrame by its index in descending order. The resulting DataFrame, df_sorted, is sorted based on the index values in descending order.

Exploring Advanced Index-Sorting Concepts

Pandas supports more advanced index-sorting concepts, such as sorting by specific levels of a MultiIndex, handling missing values in the index, and determining the sort order when sorting an index that contains mixed types. To learn more about these concepts, refer to the pandas documentation on sorting and selecting data.

Sorting the Columns of Your DataFrame

So far, we’ve focused on sorting a DataFrame by its rows. However, you can also sort the columns of your DataFrame. This can be useful when you want to reorganize the columns based on specific criteria.

Working With the DataFrame Axis

In pandas, you can specify the axis along which you want to sort your DataFrame. By default, the axis parameter is set to 0, which means you’re sorting the rows. To sort the columns, you need to set the axis parameter to 1. Here’s how you can sort the columns of a DataFrame:

df_sorted = df.sort_index(axis=1)
print(df_sorted)

Output:

Age Gender Name
0 25 Male John
1 30 Female Sarah
2 28 Male Michael

In this example, we sorted the columns of the DataFrame based on their names in ascending order. The resulting DataFrame, df_sorted, is sorted based on the column names.

Using Column Labels to Sort

Alternatively, you can also use the .sort_values() method with the axis parameter set to 1 to sort the columns of your DataFrame. Let’s see an example:

df_sorted = df.sort_values(by='Name', axis=1)
print(df_sorted)

Output:

Gender Age Name
0 Male 25 John
1 Female 30 Sarah
2 Male 28 Michael

In this example, we sorted the columns of the DataFrame based on the values in the ‘Name’ column in ascending order. The resulting DataFrame, df_sorted, is sorted based on the column values in the ‘Name’ column.

Working With Missing Data When Sorting in Pandas

When sorting a DataFrame, you may encounter missing data or NaN values in your dataset. pandas provides options to handle these missing values while sorting.

Understanding the na_position Parameter in .sort_values()

The .sort_values() method has a na_position parameter that allows you to control the placement of missing values. By default, missing values are placed last. Let’s see an example:

data = {'Name': ['John', 'Sarah', 'Michael', None],
'Age': [25, 30, None, 28],
'Gender': ['Male', 'Female', 'Male', None]}
df = pd.DataFrame(data)
df_sorted = df.sort_values(by='Age')
print(df_sorted)

Output:

Name Age Gender
0 John 25.0 Male
1 Sarah 30.0 Female
3 None NaN None
2 Michael NaN Male

In this example, we have a DataFrame with missing values in the ‘Name’, ‘Age’, and ‘Gender’ columns. By default, the missing values are placed last when sorting the DataFrame based on the ‘Age’ column.

Understanding the na_position Parameter in .sort_index()

The .sort_index() method also has a na_position parameter that allows you to control the placement of missing values. By default, missing values are placed last. Let’s see an example:

data = {'Name': ['John', 'Sarah', 'Michael', None],
'Age': [25, 30, None, 28],
'Gender': ['Male', 'Female', 'Male', None]}
df = pd.DataFrame(data)
df_sorted = df.sort_index(na_position='first')
print(df_sorted)

Output:

Name Age Gender
3 None NaN None
0 John 25.0 Male
1 Sarah 30.0 Female
2 Michael NaN Male

In this example, we have a DataFrame with missing values in the ‘Name’, ‘Age’, and ‘Gender’ columns. By setting the na_position parameter to ‘first’ when sorting the DataFrame by its index, the missing values are placed first.

Using Sort Methods to Modify Your DataFrame

So far, we’ve been creating new DataFrames when sorting our data. However, pandas also provides options to modify the existing DataFrame in place.

Using .sort_values() In Place

To sort a DataFrame in place using the .sort_values() method, you can set the inplace parameter to True. This will modify the original DataFrame. Let’s see an example:

df.sort_values(by='Age', inplace=True)
print(df)

Output:

Name Age Gender
0 John 25 Male
2 Michael 28 Male
1 Sarah 30 Female

In this example, we sorted the DataFrame by the ‘Age’ column in ascending order in place. The original DataFrame, df, is modified.

Using .sort_index() In Place

To sort a DataFrame by its index in place using the .sort_index() method, you can set the inplace parameter to True. This will modify the original DataFrame. Let’s see an example:

df.sort_index(inplace=True)
print(df)

Output:

Name Age Gender
0 John 25 Male
1 Sarah 30 Female
2 Michael 28 Male

In this example, we sorted the DataFrame by its index in ascending order in place. The original DataFrame, df, is modified.

Conclusion

In this tutorial, you learned how to use .sort_values() and .sort_index() to sort data efficiently in a pandas DataFrame. You learned how to sort a DataFrame on a single column, multiple columns, and its index. Additionally, you learned how to change the sort order, choose a sorting algorithm, handle missing data while sorting, and modify the DataFrame in place.

Sorting data is a crucial step in data analysis, and pandas provides a powerful and flexible set of tools to accomplish this task. With the knowledge gained from this tutorial, you’ll be able to effectively organize and analyze your data using pandas.