spark.read.json()
spark.read.json() function helps to convert the JSON string to DataFrame. This function parses the data and converts it into the PySpark DataFrame. This function can read JSON files from multiple sources. This function automatically detects the structure of the JSON data and it also supports flat and nested JSON structures.
Step 1: Import Libraries
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
Step 2: Creating a Spark Session
spark = SparkSession.builder \
.appName("JsonToDataFrame") \
.getOrCreate()
Step 3: Define the JSON string
json_string = '''[{"name": "David", "age": 28}, {"name": "Mark", "age": 27}]'''
Step 4: Convert the JSON string to a DataFrame
rd = spark.sparkContext.parallelize([json_string])
Step 5: Read the JSON data into a DataFrame
df = spark.read.json(rd)
Step 6: Display the DataFrame
df.show()
Output
+---+----+
|age|name|
+---+----+
| 28|David|
| 27| Mark|
+---+----+