In Reflection
based approach, the Scala interface allows converting an RDD with case
classes to a DataFrame automatically for Spark SQL.
#For implicit conversions from RDDs to DataFrames we need to impot the below package- The case class defines the schema of the table.
- The names of the arguments to the case class are read using reflection and become the names of the columns.
- Case classes can also be nested or contain complex types such as Seqs or Arrays.
- RDD can be implicitly converted to a DataFrame and then be registered as a table. Tables can be used in subsequent SQL statements.
import spark.implicits._
Case Class
- Has the table schema, where the argument names to the case class are read using the reflection method.
- Can be nested and used to contain complex types like a sequence of arrays.
Method: 1
case class Person(name: String, age: Int)
val people = sc.textFile("/FileStore/tables/people.txt").map(_.split(",")).map(p => Person(p(0),p(1).trim.toInt)).toDF()
people.show
Method:2
case class Employee(Name:String, Age:Int, Designation:String, Salary:Int,ZipCode:Int)
val Empdf = Seq(
Employee("Anto",21,"Software Engineer",2000, 56798),
Employee("Anto",21,"Software Engineer",2000, 56798),
Employee("Jack",21,"Software Engineer",2000,93798),
Employee("Mack",30,"Software Engineer",2000, 28798),
Employee("Jack",21,"Software Engineer",2000,93798),
Employee("Mack",30,"Software Engineer",2000, 28798),
Employee("Bill",62,"CEO", 22000,45798),
Employee("Joseph",74,"VP",12000,98798),
Employee("Steven",45,"Development Lead",8000,98798),
Employee("George",21,"Sr.Software Engineer",4000,98798),
Employee("Matt",21,"Sr.Software Engineer",4000,98798))
val emp=spark.createDataFrame(EmployeesData)
case class Person(name: String, age: Int)
val people = Seq(Person("Jacek", 42), Person("Patryk", 19), Person("Maksym", 5)).toDF
val df = spark.createDataFrame(people)df.show()
val emp = Seq((101, "Amy", Some(2)))
val employee = spark.createDataFrame(emp).toDF("employeeId","employeeName","managerId")
No comments:
Post a Comment