0
点赞
收藏
分享

微信扫一扫

Spark中组件Mllib的学习17之colStats:以列为基础计算统计量的基本数据


更多代码请见:​​https://github.com/xubo245/SparkLearning​​​
Spark中组件Mllib的学习之基础概念篇
1解释
colStats:以列为基础计算统计量的基本数据

2.代码:

/**
* @author xubo
* ref:Spark MlLib机器学习实战
* more code:https://github.com/xubo245/SparkLearning
*
*/
package org.apache.spark.mllib.learning.basic

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.stat.Statistics
import org.apache.spark.{SparkConf, SparkContext}

/**
* Created by xubo on 2016/5/23.
*/
object StatisticsColStatsLearning {
def main(args: Array[String]) {
val conf = new SparkConf().setMaster("local[4]").setAppName(this.getClass().getSimpleName().filter(!_.equals('$')))
val sc = new SparkContext(conf)
// val rdd = sc.textFile("file/data/mllib/input/basic/MatrixRow.txt") //读取文件
val rdd = sc.textFile("file/data/mllib/input/basic/stats.txt") //读取文件
.map(_.split(' ') //按“ ”分割
.map(_.toDouble)) //转成Double类型
.map(line => Vectors.dense(line))
val summary = Statistics.colStats(rdd) //获取Statistics实例

// rdd.foreach(each => print(each + " "))
rdd.foreach(println)
println("rdd.count:" + rdd.count())
println()
println(summary)
println(summary.max) //最大
println(summary.min) //最小
println("count" + summary.count) //个数
println(summary.numNonzeros) //非零
println("variance:" + summary.variance) //方差
println(summary.mean) //计算均值
println(summary.variance) //计算标准差
println(summary.normL1) //计算曼哈段距离:相加
println(summary.normL2) //计算欧几里得距离:平方根


// /行向量
println("\n row Vector:")
val vec = Vectors.dense(1, 2, 3, 4, 5)
println(vec)
println(vec.size)
println(vec.numActives)
// println(vec.variance)//不存在

sc.stop
}
}

3.结果:

[1.0]
[2.0]
[3.0]
[4.0]
[5.0]
rdd.count:5

org.apache.spark.mllib.stat.MultivariateOnlineSummarizer@7f9de19a
[5.0]
[1.0]
count5
[5.0]
variance:[2.5]
[3.0]
[2.5]
[15.0]
[7.416198487095663]

row Vector:
[1.0,2.0,3.0,4.0,5.0]
5
5

参考
【1】​​​http://spark.apache.org/docs/1.5.2/mllib-guide.html​​​
【2】​​​http://spark.apache.org/docs/1.5.2/programming-guide.html​​​
【3】​​​https://github.com/xubo245/SparkLearning​​


举报

相关推荐

0 条评论