Explanation of the syntax `df[‘column’] = expression` in pandas

The syntax df['column'] = expression in pandas is used to create, modify, or assign values to a column in a pandas DataFrame (df). Let’s break it down, step by step, from basic to advanced levels.


Basic Level

1. Creating a New Column

  • When a column does not exist in the DataFrame, assigning values to df['column'] creates a new column.
  • Example:

    <span>import</span> <span>pandas</span> <span>as</span> <span>pd</span>
    <span>df</span> <span>=</span> <span>pd</span><span>.</span><span>DataFrame</span><span>({</span><span>'</span><span>A</span><span>'</span><span>:</span> <span>[</span><span>1</span><span>,</span> <span>2</span><span>,</span> <span>3</span><span>]})</span>
    <span>print</span><span>(</span><span>df</span><span>)</span>
    <span># Output: </span> <span># A </span> <span># 0 1 </span> <span># 1 2 </span> <span># 2 3 </span>
    <span># Create a new column 'B' with all values set to 0 </span> <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]</span> <span>=</span> <span>0</span>
    <span>print</span><span>(</span><span>df</span><span>)</span>
    <span># Output: </span> <span># A B </span> <span># 0 1 0 </span> <span># 1 2 0 </span> <span># 2 3 0 </span>
     <span>import</span> <span>pandas</span> <span>as</span> <span>pd</span>
     <span>df</span> <span>=</span> <span>pd</span><span>.</span><span>DataFrame</span><span>({</span><span>'</span><span>A</span><span>'</span><span>:</span> <span>[</span><span>1</span><span>,</span> <span>2</span><span>,</span> <span>3</span><span>]})</span>
     <span>print</span><span>(</span><span>df</span><span>)</span>
     <span># Output: </span> <span># A </span> <span># 0 1 </span> <span># 1 2 </span> <span># 2 3 </span>
     <span># Create a new column 'B' with all values set to 0 </span> <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]</span> <span>=</span> <span>0</span>
     <span>print</span><span>(</span><span>df</span><span>)</span>
     <span># Output: </span> <span># A B </span> <span># 0 1 0 </span> <span># 1 2 0 </span> <span># 2 3 0 </span>
    import pandas as pd df = pd.DataFrame({'A': [1, 2, 3]}) print(df) # Output: # A # 0 1 # 1 2 # 2 3 # Create a new column 'B' with all values set to 0 df['B'] = 0 print(df) # Output: # A B # 0 1 0 # 1 2 0 # 2 3 0

2. Modifying an Existing Column

  • If the column already exists, assigning values replaces its contents.
  • Example:

    <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]</span> <span>=</span> <span>[</span><span>4</span><span>,</span> <span>5</span><span>,</span> <span>6</span><span>]</span> <span># Replace values in column 'B' </span> <span>print</span><span>(</span><span>df</span><span>)</span>
    <span># Output: </span> <span># A B </span> <span># 0 1 4 </span> <span># 1 2 5 </span> <span># 2 3 6 </span>
     <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]</span> <span>=</span> <span>[</span><span>4</span><span>,</span> <span>5</span><span>,</span> <span>6</span><span>]</span>  <span># Replace values in column 'B' </span> <span>print</span><span>(</span><span>df</span><span>)</span>
     <span># Output: </span> <span># A B </span> <span># 0 1 4 </span> <span># 1 2 5 </span> <span># 2 3 6 </span>
    df['B'] = [4, 5, 6] # Replace values in column 'B' print(df) # Output: # A B # 0 1 4 # 1 2 5 # 2 3 6

Intermediate Level

3. Assigning Values Based on an Expression

  • You can assign values to a column based on a computation or transformation.
  • Example:

    <span>df</span><span>[</span><span>'</span><span>C</span><span>'</span><span>]</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>+</span> <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]</span> <span># Create column 'C' as the sum of 'A' and 'B' </span> <span>print</span><span>(</span><span>df</span><span>)</span>
    <span># Output: </span> <span># A B C </span> <span># 0 1 4 5 </span> <span># 1 2 5 7 </span> <span># 2 3 6 9 </span>
     <span>df</span><span>[</span><span>'</span><span>C</span><span>'</span><span>]</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>+</span> <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]</span>  <span># Create column 'C' as the sum of 'A' and 'B' </span> <span>print</span><span>(</span><span>df</span><span>)</span>
     <span># Output: </span> <span># A B C </span> <span># 0 1 4 5 </span> <span># 1 2 5 7 </span> <span># 2 3 6 9 </span>
    df['C'] = df['A'] + df['B'] # Create column 'C' as the sum of 'A' and 'B' print(df) # Output: # A B C # 0 1 4 5 # 1 2 5 7 # 2 3 6 9

4. Assigning Values Using a Condition

  • You can assign values to a column conditionally using pandas’ boolean indexing.
  • Example:

    <span>df</span><span>[</span><span>'</span><span>D</span><span>'</span><span>]</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>].</span><span>apply</span><span>(</span><span>lambda</span> <span>x</span><span>:</span> <span>'</span><span>Even</span><span>'</span> <span>if</span> <span>x</span> <span>%</span> <span>2</span> <span>==</span> <span>0</span> <span>else</span> <span>'</span><span>Odd</span><span>'</span><span>)</span>
    <span>print</span><span>(</span><span>df</span><span>)</span>
    <span># Output: </span> <span># A B C D </span> <span># 0 1 4 5 Odd </span> <span># 1 2 5 7 Even </span> <span># 2 3 6 9 Odd </span>
     <span>df</span><span>[</span><span>'</span><span>D</span><span>'</span><span>]</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>].</span><span>apply</span><span>(</span><span>lambda</span> <span>x</span><span>:</span> <span>'</span><span>Even</span><span>'</span> <span>if</span> <span>x</span> <span>%</span> <span>2</span> <span>==</span> <span>0</span> <span>else</span> <span>'</span><span>Odd</span><span>'</span><span>)</span>
     <span>print</span><span>(</span><span>df</span><span>)</span>
     <span># Output: </span> <span># A B C D </span> <span># 0 1 4 5 Odd </span> <span># 1 2 5 7 Even </span> <span># 2 3 6 9 Odd </span>
    df['D'] = df['A'].apply(lambda x: 'Even' if x % 2 == 0 else 'Odd') print(df) # Output: # A B C D # 0 1 4 5 Odd # 1 2 5 7 Even # 2 3 6 9 Odd

5. Using Multiple Columns in Expressions

  • You can use multiple columns in a single expression for more complex computations.
  • Example:

    <span>df</span><span>[</span><span>'</span><span>E</span><span>'</span><span>]</span> <span>=</span> <span>(</span><span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>+</span> <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>])</span> <span>*</span> <span>df</span><span>[</span><span>'</span><span>C</span><span>'</span><span>]</span>
    <span>print</span><span>(</span><span>df</span><span>)</span>
    <span># Output: </span> <span># A B C D E </span> <span># 0 1 4 5 Odd 25 </span> <span># 1 2 5 7 Even 49 </span> <span># 2 3 6 9 Odd 81 </span>
     <span>df</span><span>[</span><span>'</span><span>E</span><span>'</span><span>]</span> <span>=</span> <span>(</span><span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>+</span> <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>])</span> <span>*</span> <span>df</span><span>[</span><span>'</span><span>C</span><span>'</span><span>]</span>
     <span>print</span><span>(</span><span>df</span><span>)</span>
     <span># Output: </span> <span># A B C D E </span> <span># 0 1 4 5 Odd 25 </span> <span># 1 2 5 7 Even 49 </span> <span># 2 3 6 9 Odd 81 </span>
    df['E'] = (df['A'] + df['B']) * df['C'] print(df) # Output: # A B C D E # 0 1 4 5 Odd 25 # 1 2 5 7 Even 49 # 2 3 6 9 Odd 81

Advanced Level

6. Vectorized Operations

  • Assigning values to a column can use vectorized operations for high performance.
  • Example:

    <span>df</span><span>[</span><span>'</span><span>F</span><span>'</span><span>]</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>**</span> <span>2</span> <span>+</span> <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]</span> <span>**</span> <span>2</span> <span># Fast vectorized calculation </span> <span>print</span><span>(</span><span>df</span><span>)</span>
    <span># Output: </span> <span># A B C D E F </span> <span># 0 1 4 5 Odd 25 17 </span> <span># 1 2 5 7 Even 49 29 </span> <span># 2 3 6 9 Odd 81 45 </span>
     <span>df</span><span>[</span><span>'</span><span>F</span><span>'</span><span>]</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>**</span> <span>2</span> <span>+</span> <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]</span> <span>**</span> <span>2</span>  <span># Fast vectorized calculation </span> <span>print</span><span>(</span><span>df</span><span>)</span>
     <span># Output: </span> <span># A B C D E F </span> <span># 0 1 4 5 Odd 25 17 </span> <span># 1 2 5 7 Even 49 29 </span> <span># 2 3 6 9 Odd 81 45 </span>
    df['F'] = df['A'] ** 2 + df['B'] ** 2 # Fast vectorized calculation print(df) # Output: # A B C D E F # 0 1 4 5 Odd 25 17 # 1 2 5 7 Even 49 29 # 2 3 6 9 Odd 81 45

7. Assigning Values with Conditional Logic Using np.where

  • You can use numpy for conditional assignment.
  • Example:

    <span>import</span> <span>numpy</span> <span>as</span> <span>np</span>
    <span>df</span><span>[</span><span>'</span><span>G</span><span>'</span><span>]</span> <span>=</span> <span>np</span><span>.</span><span>where</span><span>(</span><span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>></span> <span>2</span><span>,</span> <span>'</span><span>High</span><span>'</span><span>,</span> <span>'</span><span>Low</span><span>'</span><span>)</span>
    <span>print</span><span>(</span><span>df</span><span>)</span>
    <span># Output: </span> <span># A B C D E F G </span> <span># 0 1 4 5 Odd 25 17 Low </span> <span># 1 2 5 7 Even 49 29 Low </span> <span># 2 3 6 9 Odd 81 45 High </span>
     <span>import</span> <span>numpy</span> <span>as</span> <span>np</span>
     <span>df</span><span>[</span><span>'</span><span>G</span><span>'</span><span>]</span> <span>=</span> <span>np</span><span>.</span><span>where</span><span>(</span><span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>></span> <span>2</span><span>,</span> <span>'</span><span>High</span><span>'</span><span>,</span> <span>'</span><span>Low</span><span>'</span><span>)</span>
     <span>print</span><span>(</span><span>df</span><span>)</span>
     <span># Output: </span> <span># A B C D E F G </span> <span># 0 1 4 5 Odd 25 17 Low </span> <span># 1 2 5 7 Even 49 29 Low </span> <span># 2 3 6 9 Odd 81 45 High </span>
    import numpy as np df['G'] = np.where(df['A'] > 2, 'High', 'Low') print(df) # Output: # A B C D E F G # 0 1 4 5 Odd 25 17 Low # 1 2 5 7 Even 49 29 Low # 2 3 6 9 Odd 81 45 High

8. Assigning Values Using External Functions

  • Assign column values based on a custom function applied to rows or columns.
  • Example:

    <span>def</span> <span>custom_function</span><span>(</span><span>row</span><span>):</span>
    <span>return</span> <span>row</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>*</span> <span>row</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]</span>
    <span>df</span><span>[</span><span>'</span><span>H</span><span>'</span><span>]</span> <span>=</span> <span>df</span><span>.</span><span>apply</span><span>(</span><span>custom_function</span><span>,</span> <span>axis</span><span>=</span><span>1</span><span>)</span>
    <span>print</span><span>(</span><span>df</span><span>)</span>
    <span># Output: </span> <span># A B C D E F G H </span> <span># 0 1 4 5 Odd 25 17 Low 4 </span> <span># 1 2 5 7 Even 49 29 Low 10 </span> <span># 2 3 6 9 Odd 81 45 High 18 </span>
     <span>def</span> <span>custom_function</span><span>(</span><span>row</span><span>):</span>
         <span>return</span> <span>row</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>*</span> <span>row</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]</span>
    
     <span>df</span><span>[</span><span>'</span><span>H</span><span>'</span><span>]</span> <span>=</span> <span>df</span><span>.</span><span>apply</span><span>(</span><span>custom_function</span><span>,</span> <span>axis</span><span>=</span><span>1</span><span>)</span>
     <span>print</span><span>(</span><span>df</span><span>)</span>
     <span># Output: </span> <span># A B C D E F G H </span> <span># 0 1 4 5 Odd 25 17 Low 4 </span> <span># 1 2 5 7 Even 49 29 Low 10 </span> <span># 2 3 6 9 Odd 81 45 High 18 </span>
    def custom_function(row): return row['A'] * row['B'] df['H'] = df.apply(custom_function, axis=1) print(df) # Output: # A B C D E F G H # 0 1 4 5 Odd 25 17 Low 4 # 1 2 5 7 Even 49 29 Low 10 # 2 3 6 9 Odd 81 45 High 18

9. Chaining Operations

  • You can chain multiple operations to keep your code concise.
  • Example:

    <span>df</span><span>[</span><span>'</span><span>I</span><span>'</span><span>]</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>].</span><span>add</span><span>(</span><span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]).</span><span>mul</span><span>(</span><span>df</span><span>[</span><span>'</span><span>C</span><span>'</span><span>])</span>
    <span>print</span><span>(</span><span>df</span><span>)</span>
    <span># Output: </span> <span># A B C D E F G H I </span> <span># 0 1 4 5 Odd 25 17 Low 4 25 </span> <span># 1 2 5 7 Even 49 29 Low 10 49 </span> <span># 2 3 6 9 Odd 81 45 High 18 81 </span>
     <span>df</span><span>[</span><span>'</span><span>I</span><span>'</span><span>]</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>].</span><span>add</span><span>(</span><span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]).</span><span>mul</span><span>(</span><span>df</span><span>[</span><span>'</span><span>C</span><span>'</span><span>])</span>
     <span>print</span><span>(</span><span>df</span><span>)</span>
     <span># Output: </span> <span># A B C D E F G H I </span> <span># 0 1 4 5 Odd 25 17 Low 4 25 </span> <span># 1 2 5 7 Even 49 29 Low 10 49 </span> <span># 2 3 6 9 Odd 81 45 High 18 81 </span>
    df['I'] = df['A'].add(df['B']).mul(df['C']) print(df) # Output: # A B C D E F G H I # 0 1 4 5 Odd 25 17 Low 4 25 # 1 2 5 7 Even 49 29 Low 10 49 # 2 3 6 9 Odd 81 45 High 18 81

10. Assigning Multiple Columns at Once

  • Use assign() to create or modify multiple columns in a single call.
  • Example:

    <span>df</span> <span>=</span> <span>df</span><span>.</span><span>assign</span><span>(</span>
    <span>J</span><span>=</span><span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>+</span> <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>],</span>
    <span>K</span><span>=</span><span>lambda</span> <span>x</span><span>:</span> <span>x</span><span>[</span><span>'</span><span>J</span><span>'</span><span>]</span> <span>*</span> <span>2</span>
    <span>)</span>
    <span>print</span><span>(</span><span>df</span><span>)</span>
    <span># Output: </span> <span># A B C D E F G H I J K </span> <span># 0 1 4 5 Odd 25 17 Low 4 25 5 10 </span> <span># 1 2 5 7 Even 49 29 Low 10 49 7 14 </span> <span># 2 3 6 9 Odd 81 45 High 18 81 9 18 </span>
     <span>df</span> <span>=</span> <span>df</span><span>.</span><span>assign</span><span>(</span>
         <span>J</span><span>=</span><span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>+</span> <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>],</span>
         <span>K</span><span>=</span><span>lambda</span> <span>x</span><span>:</span> <span>x</span><span>[</span><span>'</span><span>J</span><span>'</span><span>]</span> <span>*</span> <span>2</span>
     <span>)</span>
     <span>print</span><span>(</span><span>df</span><span>)</span>
     <span># Output: </span> <span># A B C D E F G H I J K </span> <span># 0 1 4 5 Odd 25 17 Low 4 25 5 10 </span> <span># 1 2 5 7 Even 49 29 Low 10 49 7 14 </span> <span># 2 3 6 9 Odd 81 45 High 18 81 9 18 </span>
    df = df.assign( J=df['A'] + df['B'], K=lambda x: x['J'] * 2 ) print(df) # Output: # A B C D E F G H I J K # 0 1 4 5 Odd 25 17 Low 4 25 5 10 # 1 2 5 7 Even 49 29 Low 10 49 7 14 # 2 3 6 9 Odd 81 45 High 18 81 9 18

Expert Level

11. Dynamic Column Assignment

  • Dynamically create column names based on external inputs.
  • Example:

    <span>columns_to_add</span> <span>=</span> <span>[</span><span>'</span><span>L</span><span>'</span><span>,</span> <span>'</span><span>M</span><span>'</span><span>]</span>
    <span>for</span> <span>col</span> <span>in</span> <span>columns_to_add</span><span>:</span>
    <span>df</span><span>[</span><span>col</span><span>]</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>+</span> <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]</span>
    <span>print</span><span>(</span><span>df</span><span>)</span>
     <span>columns_to_add</span> <span>=</span> <span>[</span><span>'</span><span>L</span><span>'</span><span>,</span> <span>'</span><span>M</span><span>'</span><span>]</span>
     <span>for</span> <span>col</span> <span>in</span> <span>columns_to_add</span><span>:</span>
         <span>df</span><span>[</span><span>col</span><span>]</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>]</span> <span>+</span> <span>df</span><span>[</span><span>'</span><span>B</span><span>'</span><span>]</span>
     <span>print</span><span>(</span><span>df</span><span>)</span>
    columns_to_add = ['L', 'M'] for col in columns_to_add: df[col] = df['A'] + df['B'] print(df)

12. Using External Data for Assignment

  • Assign values to a column based on an external DataFrame or a dictionary.
  • Example:

    <span>mapping</span> <span>=</span> <span>{</span><span>1</span><span>:</span> <span>'</span><span>Low</span><span>'</span><span>,</span> <span>2</span><span>:</span> <span>'</span><span>Medium</span><span>'</span><span>,</span> <span>3</span><span>:</span> <span>'</span><span>High</span><span>'</span><span>}</span>
    <span>df</span><span>[</span><span>'</span><span>N</span><span>'</span><span>]</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>].</span><span>map</span><span>(</span><span>mapping</span><span>)</span>
    <span>print</span><span>(</span><span>df</span><span>)</span>
    <span># Output: </span> <span># A B C D E F G H I J K N </span> <span># 0 1 4 5 Odd 25 17 Low 4 25 5 10 Low </span> <span># 1 2 5 7 Even 49 29 Low 10 49 7 14 Medium </span> <span># 2 3 6 9 Odd 81 45 High 18 81 9 18 High </span>
     <span>mapping</span> <span>=</span> <span>{</span><span>1</span><span>:</span> <span>'</span><span>Low</span><span>'</span><span>,</span> <span>2</span><span>:</span> <span>'</span><span>Medium</span><span>'</span><span>,</span> <span>3</span><span>:</span> <span>'</span><span>High</span><span>'</span><span>}</span>
     <span>df</span><span>[</span><span>'</span><span>N</span><span>'</span><span>]</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>A</span><span>'</span><span>].</span><span>map</span><span>(</span><span>mapping</span><span>)</span>
     <span>print</span><span>(</span><span>df</span><span>)</span>
     <span># Output: </span> <span># A B C D E F G H I J K N </span> <span># 0 1 4 5 Odd 25 17 Low 4 25 5 10 Low </span> <span># 1 2 5 7 Even 49 29 Low 10 49 7 14 Medium </span> <span># 2 3 6 9 Odd 81 45 High 18 81 9 18 High </span>
    mapping = {1: 'Low', 2: 'Medium', 3: 'High'} df['N'] = df['A'].map(mapping) print(df) # Output: # A B C D E F G H I J K N # 0 1 4 5 Odd 25 17 Low 4 25 5 10 Low # 1 2 5 7 Even 49 29 Low 10 49 7 14 Medium # 2 3 6 9 Odd 81 45 High 18 81 9 18 High

13. Performance Optimization:

  • Use pandas’ built-in functions (apply, vectorized operations) for better performance over Python loops when assigning values.

Takeaway

The syntax df['column'] = expression is a core feature of pandas and is highly versatile. It allows you to:

  • Add, modify, and manipulate columns in a DataFrame.
  • Perform complex computations, including condition-based logic and multi-column transformations.
  • Chain operations and dynamically generate new columns.

This makes pandas a powerful library for data manipulation and analysis.

原文链接:Explanation of the syntax `df[‘column’] = expression` in pandas

© 版权声明
THE END
喜欢就支持一下吧
点赞10 分享
Making the absolute best of ourselves is not an easy task. It is a pleasurable pursuit...but it requires patience, persistence, and perseverance.
做最好的自己并不容易,这是很美好的愿望,需要耐心、坚持和毅力
评论 抢沙发

请登录后发表评论

    暂无评论内容