Skip to content

Mastering df.merge for Effortless Data Manipulation

[

Combining Data in pandas With merge(), .join(), and concat()

The merge(), .join(), and concat() functions in pandas are powerful tools for combining and analyzing data. In this tutorial, you’ll learn how to use these functions to unify and better understand your data.

pandas merge(): Combining Data on Common Columns or Indices

The merge() function is used to combine data based on common columns or indices, similar to join operations in a database. It is the most flexible of the three functions.

How to Use merge()

To use merge(), you need two DataFrame objects that you want to merge. You specify the columns or indices on which you want to merge the data. You can also specify the type of join you want to perform (inner, outer, left, or right).

Here is the basic syntax for merge():

merged_df = pd.merge(left_df, right_df, on="common_column")

Examples

Example 1:

Suppose you have two DataFrame objects, df1 and df2, with the following data:

df1:
ID Name
0 1 Bob
1 2 Jane
2 3 Sam
df2:
ID Age
0 1 25
1 2 30
2 4 35

You can merge these two DataFrames on the common column “ID” using the following code:

merged_df = pd.merge(df1, df2, on="ID")

The resulting merged DataFrame will be:

ID Name Age
0 1 Bob 25
1 2 Jane 30

Example 2:

Suppose you have two DataFrame objects, df3 and df4, with the following data:

df3:
ID Name
0 1 Bob
1 2 Jane
2 3 Sam
df4:
ID Age
0 1 25
1 2 30
2 4 35

You can merge these two DataFrames on the common column “ID” using an outer join:

merged_df = pd.merge(df3, df4, on="ID", how="outer")

The resulting merged DataFrame will be:

ID Name Age
0 1 Bob 25
1 2 Jane 30
2 3 Sam NaN
3 4 NaN 35

In this example, the NaN values indicate missing data.

pandas .join(): Combining Data on a Column or Index

The .join() function is used to combine data based on a key column or an index. It is useful when you want to combine data based on a single column, rather than multiple columns like in merge().

How to Use .join()

To use .join(), you need two DataFrame objects that you want to join. You specify the column or index on which you want to join the data.

Here is the basic syntax for .join():

joined_df = left_df.join(right_df, on="key_column")

Examples

Example 1:

Suppose you have two DataFrame objects, df5 and df6, with the following data:

df5:
ID Name
0 1 Bob
1 2 Jane
2 3 Sam
df6:
Age
0 25
1 30
2 35

You can join these two DataFrames on the index using the following code:

joined_df = df5.join(df6)

The resulting joined DataFrame will be:

ID Name Age
0 1 Bob 25
1 2 Jane 30
2 3 Sam 35

Example 2:

Suppose you have two DataFrame objects, df7 and df8, with the following data:

df7:
ID Name
0 1 Bob
1 2 Jane
2 3 Sam
df8:
ID Age
0 1 25
1 2 30
2 4 35

You can join these two DataFrames on the key column “ID” using the following code:

joined_df = df7.join(df8.set_index("ID"), on="ID")

The resulting joined DataFrame will be:

ID Name Age
0 1 Bob 25
1 2 Jane 30
2 3 Sam NaN

The NaN value indicates missing data.

pandas concat(): Combining Data Across Rows or Columns

The concat() function is used to combine DataFrame objects across rows or columns. It is useful when you want to combine multiple DataFrames into a single DataFrame.

How to Use concat()

To use concat(), you specify the DataFrame objects that you want to concatenate. You can specify the axis, which determines whether you are concatenating along rows (axis=0) or columns (axis=1).

Here is the basic syntax for concat():

concatenated_df = pd.concat([df1, df2], axis=0)

Examples

Example 1:

Suppose you have two DataFrame objects, df9 and df10, with the following data:

df9:
A B
0 1 2
1 3 4
df10:
C D
0 5 6
1 7 8

You can concatenate these two DataFrames along rows using the following code:

concatenated_df = pd.concat([df9, df10], axis=0)

The resulting concatenated DataFrame will be:

A B
0 1 2
1 3 4
0 5 6
1 7 8

Example 2:

Suppose you have two DataFrame objects, df11 and df12, with the following data:

df11:
A B
0 1 2
1 3 4
df12:
C D
0 5 6
1 7 8

You can concatenate these two DataFrames along columns using the following code:

concatenated_df = pd.concat([df11, df12], axis=1)

The resulting concatenated DataFrame will be:

A B C D
0 1 2 5 6
1 3 4 7 8

Conclusion

In this tutorial, you learned how to combine data in pandas using the merge(), .join(), and concat() functions. These functions provide flexible and powerful tools for unifying and analyzing your data. By using these functions, you can gain valuable insights and make informed decisions based on your data.

Now that you have a good understanding of these functions, you can enhance your data analysis capabilities in pandas. Experiment with different types of joins, concatenate DataFrames in various ways, and explore the possibilities for combining and analyzing your data.