What is JSON?

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. While it is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999, it lacks a number of commonly used syntactic features.

JSON data sets can be produced by:

Javascript applications using native methods to generate JSON
Non-Javascript applications using libraries (usually with extensions)
Ad hoc Javascript generators (often producing slightly wrong syntax)
Concatenation of JSON documents (which can be line-separated, or not)

Though JSON is a popular data serialization format, it does not have a formal specification and, as a result, there is significant variance among different implementations.

This variance makes importing many JSON-like datasets impossible if the JSON parser is strict in its language definition. To make importing JSON datasets as problem-free as possible, we follow the rule of "be liberal in what you accept". The goal is to accept the widest possible range of JSON and JSON-like inputs that permits unambiguous interpretation.

Basic JSON Syntax

Rules:

<NAME>:<VALUE> Data is in name/value pairs
A name/value pair consists of a field name (in double quotes), followed by a colon, followed by a value: "firstName":"John"
Data is separated by commas
Objects are enclosed in curly braces ({})
Objects can contain multiple name/values pairs: {"firstName":"John", "lastName":"Doe"}
Arrays are enclosed in square brackets ([])
An array can contain multiple objects:

"employees":[
    {"firstName":"John", "lastName":"Doe"},
    {"firstName":"Anna", "lastName":"Smith"},
    {"firstName":"Peter", "lastName":"Jones"}
]

JSON values can be:

A number (integer or floating point)
A string (in double quotes)
A Boolean (true or false)
An array (in square brackets)
An object (in curly braces)
Null

Examples

Sample JSON Files

Suppose you have JSON files named json_sample_data and json_sample_data2 that you would like to parse using Snowflake SQL. The json_sample_data2 file contains an array with 3 employee records (objects) and their associated dependent data for the employee's children, the children names and ages, cities where the employee has lived and the years that the employee has lived in those cities).

File name: json_sample_data

{  
   "root":[  
      {  
         "kind":"person",
         "fullName":"John Doe",
         "age":22,
         "gender":"Male",
         "phoneNumber":{  
            "areaCode":"206",
            "number":"1234567"
         },
         "children":[  
            {  
               "name":"Jane",
               "gender":"Female",
               "age":"6"
            },
            {  
               "name":"John",
               "gender":"Male",
               "age":"15"
            }
         ],
         "citiesLived":[  
            {  
               "place":"Seattle",
               "yearsLived":[  
                  "1995"
               ]
            },
            {  
               "place":"Stockholm",
               "yearsLived":[  
                  "2005"
               ]
            }
         ]
      },
      {  
         "kind":"person",
         "fullName":"Mike Jones",
         "age":35,
         "gender":"Male",
         "phoneNumber":{  
            "areaCode":"622",
            "number":"1567845"
         },
         "children":[  
            {  
               "name":"Earl",
               "gender":"Male",
               "age":"10"
            },
            {  
               "name":"Sam",
               "gender":"Male",
               "age":"6"
            },
            {  
               "name":"Kit",
               "gender":"Male",
               "age":"8"
            }
         ],
         "citiesLived":[  
            {  
               "place":"Los Angeles",
               "yearsLived":[  
                  "1989",
                  "1993",
                  "1998",
                  "2002"
               ]
            },
            {  
               "place":"Washington DC",
               "yearsLived":[  
                  "1990",
                  "1993",
                  "1998",
                  "2008"
               ]
            },
            {  
               "place":"Portland",
               "yearsLived":[  
                  "1993",
                  "1998",
                  "2003",
                  "2005"
               ]
            },
            {  
               "place":"Austin",
               "yearsLived":[  
                  "1973",
                  "1998",
                  "2001",
                  "2005"
               ]
            }
         ]
      },
      {  
         "kind":"person",
         "fullName":"Anna Karenina",
         "age":45,
         "gender":"Female",
         "phoneNumber":{  
            "areaCode":"425",
            "number":"1984783"
         },
         "citiesLived":[  
            {  
               "place":"Stockholm",
               "yearsLived":[  
                  "1992",
                  "1998",
                  "2000",
                  "2010"
               ]
            },
            {  
               "place":"Russia",
               "yearsLived":[  
                  "1998",
                  "2001",
                  "2005"
               ]
            },
            {  
               "place":"Austin",
               "yearsLived":[  
                  "1995",
                  "1999"
               ]
            }
         ]
      }
   ]
}

File name: json_sample_data2

{"root":[{"employees":[
   {"firstName":"John", "lastName":"Doe"},
   {"firstName":"Anna", "lastName":"Smith"},
   {"firstName":"Peter", "lastName":"Jones"}
]}]}

Uploading JSON Files to Your User Staging Area

The following command copies your local JSON sample data files to your user staging area:

PUT file:///Users/username/SVN/CUSTOMERREPO/examples/json/ @~/json/;

For more information about loading data files, see the Snowflake documentation:

https://docs.snowflake.net/manuals/user-guide/data-load.html

Create a named file format for your JSON files. The examples in this article reference this file format:

create or replace file format json type = 'json';

Parsing JSON Arrays Directly from a Staged JSON File

In the uploaded json_sample_data2 file, the first entry in the JSON object array contained in a JSON data file can be accessed like this:

SELECT 'The First Employee Record is '||
 S.$1:root[0].employees[0].firstName||
 ' '||S.$1:root[0].employees[0].lastName
FROM @~/json/json_sample_data2 (file_format => 'json') as S;

The returned output would be: The First Employee Record Is: John Doe.

Using the FLATTEN Function to Parse JSON Arrays

FLATTEN is a table function that converts a repeated field into a set of rows. Given one scalar value with many values for a repeated field, FLATTEN unrolls it into many records, one record for each value of the (formerly) repeated field; any non-repeated fields become duplicated to fill out each of the new records formed. FLATTEN removes one level of nesting.

Uploaded JSON data is stored in a single column (S.$1). The following query parses the array in the uploaded json_sample_data file. When the query applies the LATERAL FLATTEN function, we end up with three values of fullName:

select t.value 
from @~/json/json_sample_data.gz (file_format => 'json') as S
, table(flatten(S.$1,'root')) t;

Output

User-added image

select t.value:fullName 
from @~/json/json_sample_data.gz (file_format => 'json') as S, 
table(flatten(S.$1,'root')) t;

Output

Using the PARSE_JSON Function

This function parses text as a JSON document, producing a VARIANT value. If the input is NULL, the output will also be NULL. If the input string is 'null', it is interpreted as a JSON null value, meaning the result is not a SQL NULL, but a valid VARIANT value containing null (the difference is apparent when printing this VARIANT value).

SELECT t.value:children[0],
       t.value:children[0].age,
       parse_json(t.value:children[0].age)
FROM @~/json/json_sample_data.gz (file_format => 'json') as S, table(flatten(S.$1,'root')) t;

Output

User-added image

Using the GET_PATH Function

The GET_PATH function is used to extract a value from structured data using a path name. It is a variant of GET that takes VARIANT, OBJECT, or ARRAY value as the first argument and proceeds to extract the VARIANT value of the field or the element according to the path name provided as the second argument.

Path name notation is common JavaScript notation: a concatenation of field names (identifiers) preceded with dots and index operators [index]. The first field name does not have to have the leading dot specified. Index values in the index operators can be non-negative decimal numbers (for arrays) or single or double-quoted string literals (for objects).

GET_PATH is equivalent to a chain of GET functions. It returns NULL if the path name does correspond to any element.

SELECT GET_PATH(S.$1, 'root[0].fullName')
FROM @~/json/json_sample_data.gz (file_format => 'json') as S;

Output

SELECT t.value:fullName
FROM @~/json/json_sample_data.gz (file_format => 'json') as S
, table(flatten(S.$1,'root')) t;

Output

SELECT 
t.value[0]:children as First_person_children,  
t.value[1]:children as Second_person_children,
t.value[2]:children as Third_Person_children                           
FROM @~/json/json_sample_data.gz (file_format => 'json') as S
, table(flatten(S.$1,'')) t;

Output

select NVL(t.value[0]:children,'No Children') 
as FIRST_PERSON_CHILDREN, 
NVL(array_size(t.value[0]:children),0) 
as FIRST_PERSON_CHILDREN_Count,  
	  NVL(t.value[1]:children,'No CHildren') 
  as Second_person_children, 
  NVL(array_size(t.value[1]:children),0) 
  as Second_PERSON_CHILDREN_Count, 
	  NVL(t.value[2]:children, 'No Children') 
  as Third_Person_children,  
  NVL(array_size(t.value[2]:children),0) 
  as Third_PERSON_CHILDREN_Count                       
From @~/json/json_sample_data.gz (file_format => 'json') 
as S, table(flatten(S.$1,'')) t;

Output

SELECT t.value
FROM @~/json/json_sample_data.gz (file_format => 'json') as S, table(flatten(S.$1,'')) t;

Output

The entire content of the json_sample_data file is returned as the output of the SELECT statement (partial results shown):

+-----------------------------------+
| VALUE                             |
|-----------------------------------|
| [                                 |
|   {                               |
|     "age": 22,                    |
|     "children": [                 |
|       {                           |
|         "age": "6",               |
|         "gender": "Female",       |
|         "name": "Jane"            |
|       },                          |
|       {                           |
|         "age": "15",              |
|         "gender": "Male",         |
|         "name": "John"            |
|       }                           |
|     ],                            |
|     "citiesLived": [              |
|       {                           |
|         "place": "Seattle",       |
|         "yearsLived": [           |
|           "1995"                  |
|         ]                         |
|       },                          |
|       {                           |
|         "place": "Stockholm",     |
|         "yearsLived": [           |
|           "2005"                  |
|         ]                         |
|       }                           |
|     ],                            |
|     "fullName": "John Doe",       |
|     "gender": "Male",             |
|     "kind": "person",             |
|     "phoneNumber": {              |
|       "areaCode": "206",          |
|       "number": "1234567"         |
|     }                             |
|   },                              |
..
+-----------------------------------+

SELECT t.value
FROM @~/json/json_sample_data.gz (file_format => 'json') as S
, table(flatten(S.$1,'root')) t;

Output

The 3 data arrays of the three employees records and their dependent data is returned.

SELECT
array_size(t.value)
FROM @~/json/json_sample_data.gz (file_format => 'json') as S
, table(flatten(S.$1,'')) t;

Output

Number 3, the number of employee records along with all their dependent records (children names, ages, etc) in the JSON sample file this tutorial uses.

SELECT emps.value:fullName, citliv.value:place, citliv.value:yearsLived , yrliv.value 
FROM @~/json/json_sample_data.gz (file_format => 'json') as S,
table(flatten(S.$1,'root')) emps,
table(flatten(emps.value:citiesLived,'')) citliv ,
table(flatten(citliv.value:yearsLived,'')) yrliv;

Output

User-added image

SELECT emps.value:fullName, citliv.value:place, count(yrliv.value) as Years_Count 
FROM @~/json/json_sample_data.gz (file_format => 'json') as S,
table(flatten(S.$1,'root')) emps,
table(flatten(emps.value:citiesLived,'')) citliv,
table(flatten(citliv.value:yearsLived,'')) yrliv
group by emps.value:fullName, citliv.value:place
order by 1,2;

Output

User-added image

SanjayLakhanpal

JSON DATA PARSING IN SNOWFLAKE

What is JSON?

Basic JSON Syntax

Examples

Sample JSON Files

Uploading JSON Files to Your User Staging Area

Parsing JSON Arrays Directly from a Staged JSON File

Using the FLATTEN Function to Parse JSON Arrays

Output

Using the PARSE_JSON Function

Output

Using the GET_PATH Function

Output

Output

Output

Output

Output

Output

Output

Output

Output

Add a comment

Add ping

Search

Best of me

Categories

Pages

Subscribe