Pyspark – Converting JSON string to DataFrame

spark.read.json()

spark.read.json() function helps to convert the JSON string to DataFrame. This function parses the data and converts it into the PySpark DataFrame. This function can read JSON files from multiple sources. This function automatically detects the structure of the JSON data and it also supports flat and nested JSON structures.

Step 1: Import Libraries

from pyspark.sql import SparkSession
from pyspark.sql.functions import col

Step 2: Creating a Spark Session

spark = SparkSession.builder \
    .appName("JsonToDataFrame") \
    .getOrCreate()

Step 3: Define the JSON string

json_string = '''[{"name": "David", "age": 28}, {"name": "Mark", "age": 27}]'''

Step 4: Convert the JSON string to a DataFrame

rd = spark.sparkContext.parallelize([json_string])

Step 5: Read the JSON data into a DataFrame

df = spark.read.json(rd)

Step 6: Display the DataFrame

df.show()

Output

+---+----+
|age|name|
+---+----+
| 28|David|
| 27| Mark|
+---+----+