In Spark 2.2 there two ways to add constant value in column in DataFrame.
- Using lit()
- Using typedLit()
typedLit
can also handle parameterized scala types e.g. List, Seq, and MapCreate DataFrame
val empDF = spark.createDataFrame(Seq(
(7369, "SMITH", "CLERK", 7902, "17-Dec-80", 800, 20, 10),
(7499, "ALLEN", "SALESMAN", 7698, "20-Feb-81", 1600, 300, 30),
(7521, "WARD", "SALESMAN", 7698, "22-Feb-81", 1250, 500, 30),
(7566, "JONES", "MANAGER", 7839, "2-Apr-81", 2975, 0, 20),
(7654, "MARTIN", "SALESMAN", 7698, "28-Sep-81", 1250, 1400, 30),
(7698, "BLAKE", "MANAGER", 7839, "1-May-81", 2850, 0, 30),
(7782, "CLARK", "MANAGER", 7839, "9-Jun-81", 2450, 0, 10),
(7788, "SCOTT", "ANALYST", 7566, "19-Apr-87", 3000, 0, 20),
(7839, "KING", "PRESIDENT", 0, "17-Nov-81", 5000, 0, 10),
(7844, "TURNER", "SALESMAN", 7698, "8-Sep-81", 1500, 0, 30),
(7876, "ADAMS", "CLERK", 7788, "23-May-87", 1100, 0, 20)
)).toDF("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "deptno")
val empDF = spark.createDataFrame(Seq(
(7369, "SMITH", "CLERK", 7902, "17-Dec-80", 800, 20, 10),
(7499, "ALLEN", "SALESMAN", 7698, "20-Feb-81", 1600, 300, 30),
(7521, "WARD", "SALESMAN", 7698, "22-Feb-81", 1250, 500, 30),
(7566, "JONES", "MANAGER", 7839, "2-Apr-81", 2975, 0, 20),
(7654, "MARTIN", "SALESMAN", 7698, "28-Sep-81", 1250, 1400, 30),
(7698, "BLAKE", "MANAGER", 7839, "1-May-81", 2850, 0, 30),
(7782, "CLARK", "MANAGER", 7839, "9-Jun-81", 2450, 0, 10),
(7788, "SCOTT", "ANALYST", 7566, "19-Apr-87", 3000, 0, 20),
(7839, "KING", "PRESIDENT", 0, "17-Nov-81", 5000, 0, 10),
(7844, "TURNER", "SALESMAN", 7698, "8-Sep-81", 1500, 0, 30),
(7876, "ADAMS", "CLERK", 7788, "23-May-87", 1100, 0, 20)
)).toDF("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "deptno")
The
lit() function creates a column object out of a literal value. Let’s create a DataFrame and use the
lit() function.import org.apache.spark.sql.functions.lit
val emp = empDF.select($"empno",$"ename",$"sal",$"deptno")
emp.show()
val emp = empDF.select($"empno",$"ename",$"sal",$"deptno")
emp.show()
import org.apache.spark.sql.functions.lit
val emp_lit = empDF.select($"empno",$"ename",$"sal",$"deptno",lit("lit").as("lit_column"))
emp_lit.show()
val emp_lit = empDF.select($"empno",$"ename",$"sal",$"deptno",lit("lit").as("lit_column"))
emp_lit.show()
Example: 3 (Lit Functions with withColumn function)
import org.apache.spark.sql.functions.{when, _}
import org.apache.spark.sql.functions.lit
empDF.withColumn("lit_value", when($"sal" >=2500,lit("SENIOR DEVELOPER")).otherwise("DEVELOPER")).show
import org.apache.spark.sql.functions.lit
empDF.withColumn("lit_value", when($"sal" >=2500,lit("SENIOR DEVELOPER")).otherwise("DEVELOPER")).show
Custom Transformation Example
def withGreeting()(empDF: DataFrame): DataFrame = {
empDF.withColumn("Lit_Value",when ($"sal"> 2000, lit("SE")).otherwise("developer"))
}
empDF.transform(withGreeting()).show()
def withGreeting1(df: DataFrame): DataFrame = {
empDF.withColumn("Lit_Value",when ($"sal"> 2000, lit("SE")).otherwise("developer"))
}
withGreeting1(empDF).show
empDF.withColumn("Lit_Value",when ($"sal"> 2000, lit("SE")).otherwise("developer"))
}
empDF.transform(withGreeting()).show()
def withGreeting1(df: DataFrame): DataFrame = {
empDF.withColumn("Lit_Value",when ($"sal"> 2000, lit("SE")).otherwise("developer"))
}
withGreeting1(empDF).show
Example: 3 (Lit Functions with withColumn function)
The lit() function is especially useful when making boolean comparisons.The when() and otherwise() functions are used for control flow in Spark SQL, similar to
if and else
in other programming languages. empDF.withColumn("Job_CD", when($"job" === lit("CLERK"), lit("CLK"))
.when($"job" === lit("SALESMAN"), lit("SAM"))
.when($"job" === lit("MANAGER"), lit("MGR"))
.when($"job" === lit("ANALYST"), lit("ANT"))
.when($"job" === lit("PRESIDENT"), lit("PRT"))
.otherwise("OTHERS")).show
.when($"job" === lit("SALESMAN"), lit("SAM"))
.when($"job" === lit("MANAGER"), lit("MGR"))
.when($"job" === lit("ANALYST"), lit("ANT"))
.when($"job" === lit("PRESIDENT"), lit("PRT"))
.otherwise("OTHERS")).show
import org.apache.spark.sql.functions.typedLit
val df = spark.createDataFrame(Seq((0,"a"),(1,"b"),(2,"c"))).toDF("id", "col1")
df.withColumn("newcol", typedLit(("sample", 10, .044))).show
val df = spark.createDataFrame(Seq((0,"a"),(1,"b"),(2,"c"))).toDF("id", "col1")
df.withColumn("newcol", typedLit(("sample", 10, .044))).show
Example: 1
val df1 = df.withColumn("typedLit_seq", typedLit(Seq(1, 2, 3)))
.withColumn("typedLit_map", typedLit(Map("a" -> 1, "b" -> 2)))
.withColumn("typedLit_struct", typedLit(("a", 2, 1.0)))
df1.printSchema()
val df1 = df.withColumn("typedLit_seq", typedLit(Seq(1, 2, 3)))
.withColumn("typedLit_map", typedLit(Map("a" -> 1, "b" -> 2)))
.withColumn("typedLit_struct", typedLit(("a", 2, 1.0)))
df1.printSchema()
Example: 2
val df1 = df.withColumn("typedLit_seq", typedLit(Seq(1, 2, 3)))
.withColumn("typedLit_map", typedLit(Map("a" -> 1, "b" -> 2)))
.withColumn("typedLit_struct", typedLit(("a", 2, 1.0)))
df1.show
No comments:
Post a Comment