Create DataFrame Using Refelection

In Reflection based approach, the Scala interface allows converting an RDD with case classes to a DataFrame automatically for Spark SQL.
  • The case class defines the schema of the table. 
  • The names of the arguments to the case class are read using reflection and become the names of the columns. 
  • Case classes can also be nested or contain complex types such as Seqs or Arrays. 
  • RDD can be implicitly converted to a DataFrame and then be registered as a table. Tables can be used in subsequent SQL statements.
#For implicit conversions from RDDs to DataFrames we need to impot the below package
import spark.implicits._

Case Class
  • Has the table schema, where the argument names to the case class are read using the reflection method.
  • Can be nested and used to contain complex types like a sequence of arrays.
Scala Interface implicitly converts the resultant RDD to a DataFrame and register it as a table. Use it in the subsequent SQL statements.

Method: 1
case class Person(name: String, age: Int)
val people = sc.textFile("/FileStore/tables/people.txt").map(_.split(",")).map(p => Person(p(0),p(1).trim.toInt)).toDF()
people.show


Method:2
case class Employee(Name:String, Age:Int, Designation:String, Salary:Int,ZipCode:Int)
val Empdf = Seq(
Employee("Anto",21,"Software Engineer",2000, 56798),
Employee("Jack",21,"Software Engineer",2000,93798),                    Employee("Mack",30,"Software Engineer",2000, 28798),
Employee("Jack",21,"Software Engineer",2000,93798),                    Employee("Mack",30,"Software Engineer",2000, 28798),
Employee("Bill",62,"CEO", 22000,45798),
Employee("Joseph",74,"VP",12000,98798),
Employee("Steven",45,"Development Lead",8000,98798),
Employee("George",21,"Sr.Software Engineer",4000,98798),
Employee("Matt",21,"Sr.Software Engineer",4000,98798))
val emp=spark.createDataFrame(EmployeesData)

case class Person(name: String, age: Int)
val people = Seq(Person("Jacek", 42), Person("Patryk", 19), Person("Maksym", 5)).toDF
val df = spark.createDataFrame(people)
df.show()


val emp = Seq((101, "Amy", Some(2)))
val employee = spark.createDataFrame(emp).toDF("employeeId","employeeName","managerId")

No comments:

Post a Comment