如何進(jìn)行Spark中MLlib的本質(zhì)分析,相信很多沒(méi)有經(jīng)驗(yàn)的人對(duì)此束手無(wú)策,為此本文總結(jié)了問(wèn)題出現(xiàn)的原因和解決方法,通過(guò)這篇文章希望你能解決這個(gè)問(wèn)題。

成都創(chuàng)新互聯(lián)公司2013年開(kāi)創(chuàng)至今,是專業(yè)互聯(lián)網(wǎng)技術(shù)服務(wù)公司,擁有項(xiàng)目成都網(wǎng)站制作、成都做網(wǎng)站網(wǎng)站策劃,項(xiàng)目實(shí)施與項(xiàng)目整合能力。我們以讓每一個(gè)夢(mèng)想脫穎而出為使命,1280元淶水做網(wǎng)站,已為上家服務(wù),為淶水各地企業(yè)和個(gè)人服務(wù),聯(lián)系電話:18982081108
org.apache.spark.ml(http://spark.apache.org/docs/latest/ml-guide.html )
org.apache.spark.ml.attribute org.apache.spark.ml.classification org.apache.spark.ml.clustering org.apache.spark.ml.evaluation org.apache.spark.ml.feature org.apache.spark.ml.param org.apache.spark.ml.recommendation org.apache.spark.ml.regression org.apache.spark.ml.source.libsvm org.apache.spark.ml.tree org.apache.spark.ml.tuning org.apache.spark.ml.util
org.apache.spark.mllib (http://spark.apache.org/docs/latest/mllib-guide.html )
org.apache.spark.mllib.classification org.apache.spark.mllib.clustering org.apache.spark.mllib.evaluation org.apache.spark.mllib.feature org.apache.spark.mllib.fpm org.apache.spark.mllib.linalg org.apache.spark.mllib.linalg.distributed org.apache.spark.mllib.pmml org.apache.spark.mllib.random org.apache.spark.mllib.rdd org.apache.spark.mllib.recommendation org.apache.spark.mllib.regression org.apache.spark.mllib.stat org.apache.spark.mllib.stat.distributed org.apache.spark.mllib.stat.test org.apache.spark.mllib.tree org.apache.spark.mllib.tree.configuration org.apache.spark.mllib.tree.impurity org.apache.spark.mllib.tree.loss org.apache.spark.mllib.tree.model org.apache.spark.mllib.util
ML概念
DataFrame: Spark ML uses DataFrame from Spark SQL as an ML dataset, which can hold a variety of data types. E.g., a DataFrame could have different columns storing text, feature vectors, true labels, and predictions. Transformer: A Transformer is an algorithm which can transform one DataFrame into another DataFrame. E.g., an ML model is a Transformer which transforms DataFrame with features into a DataFrame with predictions. Estimator: An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer. E.g., a learning algorithm is an Estimator which trains on a DataFrame and produces a model. Pipeline: A Pipeline chains multiple Transformers and Estimators together to specify an ML workflow. Parameter: All Transformers and Estimators now share a common API for specifying parameters.
ML分類和回歸
Classification Logistic regression Decision tree classifier Random forest classifier Gradient-boosted tree classifier Multilayer perceptron classifier One-vs-Rest classifier (a.k.a. One-vs-All) Regression Linear regression Decision tree regression Random forest regression Gradient-boosted tree regression Survival regression Decision trees Tree Ensembles Random Forests Gradient-Boosted Trees (GBTs)
ML聚類
K-means Latent Dirichlet allocation (LDA)
MLlib 數(shù)據(jù)類型
Local vector Labeled point Local matrix Distributed matrix RowMatrix IndexedRowMatrix CoordinateMatrix BlockMatrix
MLlib 分類和回歸
Binary Classification: linear SVMs, logistic regression, decision trees, random forests, gradient-boosted trees, naive Bayes Multiclass Classification:logistic regression, decision trees, random forests, naive Bayes Regression:linear least squares, Lasso, ridge regression, decision trees, random forests, gradient-boosted trees, isotonic regression
MLlib 聚類
K-means Gaussian mixture Power iteration clustering (PIC,多用于圖像識(shí)別) Latent Dirichlet allocation (LDA,多用于主題分類) Bisecting k-means Streaming k-means
MLlib Models
DecisionTreeModel DistributedLDAModel GaussianMixtureModel GradientBoostedTreesModel IsotonicRegressionModel KMeansModel LassoModel LDAModel LinearRegressionModel LocalLDAModel LogisticRegressionModel MatrixFactorizationModel NaiveBayesModel PowerIterationClusteringModel RandomForestModel RidgeRegressionModel StreamingKMeansModel SVMModel Word2VecModel
Example
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.param.ParamMap
import org.apache.spark.mllib.linalg.{Vector, Vectors}
import org.apache.spark.sql.Row
val training = sqlContext.createDataFrame(Seq( (1.0, Vectors.dense(0.0, 1.1, 0.1)), (0.0, Vectors.dense(2.0, 1.0, -1.0)), (0.0, Vectors.dense(2.0, 1.3, 1.0)), (1.0, Vectors.dense(0.0, 1.2, -0.5)) ))
.toDF("label", "features")
val lr = new LogisticRegression()
println("LogisticRegression parameters:\n" + lr.explainParams() + "\n")
lr.setMaxIter(10).setRegParam(0.01)
val model1 = lr.fit(training)
println("Model 1 was fit using parameters: " + model1.parent.extractParamMap)
val paramMap = ParamMap(lr.maxIter -> 20)
.put(lr.maxIter, 30)
.put(lr.regParam -> 0.1, lr.threshold -> 0.55)
val paramMap2 = ParamMap(lr.probabilityCol -> "myProbability")
val paramMapCombined = paramMap ++ paramMap2
val model2 = lr.fit(training, paramMapCombined)
println("Model 2 was fit using parameters: " + model2.parent.extractParamMap)
test = sqlContext.createDataFrame(Seq( (1.0, Vectors.dense(-1.0, 1.5, 1.3)), (0.0, Vectors.dense(3.0, 2.0, -0.1)), (1.0, Vectors.dense(0.0, 2.2, -1.5)) ))
.toDF("label", "features")
model2.transform(test)
.select("features", "label", "myProbability", "prediction")
.collect()
.foreach { case Row(features: Vector, label: Double, prob: Vector, prediction: Double) => println(s"($features, $label) -> prob=$prob, prediction=$prediction") }看完上述內(nèi)容,你們掌握如何進(jìn)行Spark中MLlib的本質(zhì)分析的方法了嗎?如果還想學(xué)到更多技能或想了解更多相關(guān)內(nèi)容,歡迎關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道,感謝各位的閱讀!
網(wǎng)頁(yè)題目:如何進(jìn)行Spark中MLlib的本質(zhì)分析
分享路徑:http://www.chinadenli.net/article8/iiioop.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供網(wǎng)站維護(hù)、網(wǎng)站導(dǎo)航、自適應(yīng)網(wǎng)站、網(wǎng)站策劃、企業(yè)網(wǎng)站制作、全網(wǎng)營(yíng)銷推廣
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來(lái)源: 創(chuàng)新互聯(lián)