Find the sum of List Value
val rdd = sc.parallelize(List("animal", "human", "bird", "rat"))
val rdd1=rdd.map(x =>(x.length))
rdd1.reduce(_+_)
val rdd1=rdd.map(x =>(x.length))
rdd1.reduce(_+_)
Word count program
object WordCount {def main(args: Array[String]) {
println("In main : " + args(0) + "," + args(1))
//Create Spark Session
val spark = sparkSession.builder().setAppName("WordCountApp")
.config("spark.some.config.option", "some-value")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
//Load Data from File
val input = sc.textFile(args(0))
//Split into words
val words = input.flatMap(line => line.split(" "))
//Assign unit to each word
val units = words.map ( word => (word, 1) )
//Reduce each key
val counts = units.reduceByKey ( (x, y) => x + y )
//Write output to Disk
counts.saveAsTextFile(args(1))
//Shutdown spark. System.exit(0) or sys.exit() can also be used
sc.stop();
}
}
Join two files read with textFile()
Read textFile 1
A B
1 2
3 4
val txt1=sc.textFile(path="/path/to/input/file1")
Read textFile 2
A C
1 5
3 6
val txt2=sc.textFile(path="/path/to/input/file2")
Join and print the result.
txt1.join(txt2).foreach(println)
A B C
1 2 5
3 4 6
The join above is based on the first column.
No comments:
Post a Comment