In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-09-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "RDD and DataFrame conversion example usage in Spark SQL". The explanation in this article is simple and clear and easy to learn and understand. Please follow the editor's train of thought to study and learn "RDD and DataFrame conversion example usage in Spark SQL".
one。 The first way is to convert RDD to DataFrame1. Official website
two。 Interpreting reflection defines all the schema information in the case class class. Code package coreimport org.apache.spark.sql.SparkSessionimport org.apache.spark.sql.types.StructTypeobject Test {def main (args: Array [String]): Unit = {val spark = SparkSession.builder () .appName ("Test") .master ("local [2]") .getOrCreate () val mess = spark.sparkContext.textFile ("file:///D:\\test\\person.txt") import spark.implicits._") Val result = mess.map (_ .split (" ") .map (x = > Info (x (0) .toInt, x (1)) X (2) .toInt). ToDF () / / result.map (x = > x (0)). Show () / / in version 1.x, price rdd result.rdd.map (x = > x (0)). Collect (). Foreach (println) result.rdd.map (x = > x.getAs [Int] ("id")). Collect (). Foreach (println)}} case class Info (id:Int,name:String,age:Int) 4. Note: before version 2.2, the constructor parameters of the class are limited. After 2.2, there is no limit.
two。 The second way of conversion is 1. Official website
two。 Explain that the formulation of scheme information is the way that programming acts on Row 3. Steps
4. Step to explain the transformation from the original RDD, similar to textFile a StructType matches the data structure in Row (several columns), that is, several StructField associate schema with RDD through createDataFrame. Source code interpretation StructType
6. The source code interpretation StructField can be understood as a list of StructType containing 1 StructField7. The final code package coreimport org.apache.spark.sql.types. {IntegerType, StringType, StructField, StructType} import org.apache.spark.sql. {Row SparkSession} object TestRDD2 {def main (args: Array [String]): Unit = {val spark = SparkSession.builder () .appName ("TestRDD2") .master ("local [2]") .getOrCreate () val mess = spark.sparkContext.textFile ("file:///D:\\test\\person.txt") val result = mess.map (_ .split (", ")) .map (x = > Row (x (0). ToInt, x (1)) X (2) .toInt) / / write val structType = new StructType (Array ("id", IntegerType, true), StructField ("name", StringType, true), StructField ("age", IntegerType, true)) val schema = StructType (structType) val info = spark.createDataFrame (result,schema) info.show ()}} 8. Classic error
9. Solve the problem that the schema information defined by yourself does not match the information in Row val result = mess.map (_ .split (",")) .map (x = > Row (x (0), x (1), x (2)) / / write val structType = new StructType (StructField ("id", IntegerType, true), StructField ("name", StringType, true), StructField ("age", IntegerType, true)) above string wants int Be careful to convert type val result = mess.map (_ .split (",") .map (x = > Row (x (0). ToInt, x (1), x (2) .toInt)) three because of frequent errors. The use of 1.spark-shell some methods in the code to change their own hermit brick df.select ('name). Show this in spark-shell or df.select (' name') .show but not in the code, need hermit to 2.show source code show source code default is true display less than or equal to 20 items If the characters in the corresponding line are false, all the characters in the corresponding line will be displayed. Show (30 false) will not truncate show (5), but the following characters and 20 characters will not show that you can show (5).
3.select method source code
4.select method call position df.select ("name") .show (false) import spark.implicits._// such as df.select ('name) .show (false) df.select ($"name") the underlying source code of the first select is the first source code figure 2, the source code of the three select is the second 5.head source code head defaults to the first, and you can tune a few if you want to show
6.first () shows that the first underlying call is head
The default interpretation of 7.sort source code sort source code in ascending and descending order is
4. The operation method of SQL 1. The official website temporarily tries
two。 Global attempt to manipulate global view plus global_temp rules
five。 Miscellaneous 1. Error
two。 Reason and code val spark = SparkSession.builder () .appName ("Test") .master ("local [2]") .getOrCreate () val mess = spark.sparkContext.textFile ("file:///D:\\test\\person.txt") import spark.implicits._ val result = mess.map (_ .split (", ")) .map (x = > Info (x (0) .toInt, x (1)) X (2) .toInt). ToDF () / / in version 1.x, price rdd result.map (x = > x (0)) is not required in 2.x. Show () is right to write result.rdd.map (x = > x (0)). Collect (). Foreach (println) remove the data in the class in two ways: result.rdd.map (x = > x (0)). Collect (). Foreach (println) result.rdd. Map (x = > x.getAs [Int] ("id")). Collect (). Foreach (println) 3. Pay attention to escape characters for delimiters | you must add escape characters for segmentation, otherwise the data is not correct. Thank you for reading. The above is the content of "RDD and DataFrame conversion instance usage in Spark SQL". After the study of this article, I believe you have a deeper understanding of the use of RDD and DataFrame conversion examples in Spark SQL, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.