Spark: Add literal and Constant to DataFrame

In Spark 2.2 there two ways to add constant value in column in DataFrame.

Using lit()
Using typedLit()

Both are used to add a new column by assigning a literal or constant value to Spark DataFrame. The lit() function creates a column object out of a literal value. These both functions return Column as return type. The difference between the two is that typedLit can also handle parameterized scala types e.g. List, Seq, and Map

Create DataFrame
val empDF = spark.createDataFrame(Seq(
      (7369, "SMITH", "CLERK", 7902, "17-Dec-80", 800, 20, 10),
      (7499, "ALLEN", "SALESMAN", 7698, "20-Feb-81", 1600, 300, 30),
      (7521, "WARD", "SALESMAN", 7698, "22-Feb-81", 1250, 500, 30),
      (7566, "JONES", "MANAGER", 7839, "2-Apr-81", 2975, 0, 20),
      (7654, "MARTIN", "SALESMAN", 7698, "28-Sep-81", 1250, 1400, 30),
      (7698, "BLAKE", "MANAGER", 7839, "1-May-81", 2850, 0, 30),
      (7782, "CLARK", "MANAGER", 7839, "9-Jun-81", 2450, 0, 10),
      (7788, "SCOTT", "ANALYST", 7566, "19-Apr-87", 3000, 0, 20),
      (7839, "KING", "PRESIDENT", 0, "17-Nov-81", 5000, 0, 10),
      (7844, "TURNER", "SALESMAN", 7698, "8-Sep-81", 1500, 0, 30),
      (7876, "ADAMS", "CLERK", 7788, "23-May-87", 1100, 0, 20)
    )).toDF("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "deptno")

The lit() function creates a column object out of a literal value. Let’s create a DataFrame and use the lit() function.

Example: 1 (without Lit Function)

import org.apache.spark.sql.functions.lit
val emp = empDF.select($"empno",$"ename",$"sal",$"deptno")
emp.show()

Example: 2 (with Lit Functions)

import org.apache.spark.sql.functions.lit
val emp_lit = empDF.select($"empno",$"ename",$"sal",$"deptno",lit("lit").as("lit_column"))
emp_lit.show()

Example: 3 (Lit Functions with withColumn function)

import org.apache.spark.sql.functions.{when, _}
import org.apache.spark.sql.functions.lit
empDF.withColumn("lit_value", when($"sal" >=2500,lit("SENIOR DEVELOPER")).otherwise("DEVELOPER")).show

Custom Transformation Example

def withGreeting()(empDF: DataFrame): DataFrame = {
empDF.withColumn("Lit_Value",when ($"sal"> 2000, lit("SE")).otherwise("developer"))
}
empDF.transform(withGreeting()).show()

def withGreeting1(df: DataFrame): DataFrame = {
empDF.withColumn("Lit_Value",when ($"sal"> 2000, lit("SE")).otherwise("developer"))
}
withGreeting1(empDF).show

Example: 3 (Lit Functions with withColumn function)

The lit() function is especially useful when making boolean comparisons.The when() and otherwise() functions are used for control flow in Spark SQL, similar to if and else in other programming languages.

empDF.withColumn("Job_CD", when($"job" === lit("CLERK"), lit("CLK"))
                       .when($"job" === lit("SALESMAN"), lit("SAM"))
                       .when($"job" === lit("MANAGER"), lit("MGR"))
                       .when($"job" === lit("ANALYST"), lit("ANT"))
                       .when($"job" === lit("PRESIDENT"), lit("PRT"))
                       .otherwise("OTHERS")).show

TypeLit Functions with withColumn function

import org.apache.spark.sql.functions.typedLit
val df = spark.createDataFrame(Seq((0,"a"),(1,"b"),(2,"c"))).toDF("id", "col1")
df.withColumn("newcol", typedLit(("sample", 10, .044))).show

Example: 1
val df1 = df.withColumn("typedLit_seq", typedLit(Seq(1, 2, 3)))
.withColumn("typedLit_map", typedLit(Map("a" -> 1, "b" -> 2)))
.withColumn("typedLit_struct", typedLit(("a", 2, 1.0)))
df1.printSchema()

Example: 2
val df1 = df.withColumn("typedLit_seq", typedLit(Seq(1, 2, 3)))
.withColumn("typedLit_map", typedLit(Map("a" -> 1, "b" -> 2)))
.withColumn("typedLit_struct", typedLit(("a", 2, 1.0)))
df1.show

Spark

Add literal and Constant to DataFrame

No comments:

Post a Comment