JSON and its Query

1. JSON
- 1.1. 格式简介
- 1.2. 格式化 JSON 数据 (python -m json.tool)
2. JMESPath (JSON Query)

1. JSON

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.

参考：
Introducing JSON: http://www.json.org/
The JavaScript Object Notation (JSON) Data Interchange Format: https://tools.ietf.org/html/rfc7159

1.1. 格式简介

JSON 格式很简单，它主要由两种结构构成：
(1) “Key/Value 对”组成的集合，用大括号 {} 表示，Key 和 Value 之间用冒号 : 分开，多个 Key/Value 对之间用逗号分开。
(2) 有序数组，用中括号 [] 表示，多个对象之间用逗号分开。

下面是 JSON 的一个例子，它表示了一个人的基本信息：

{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    },
    {
      "type": "mobile",
      "number": "123 456-7890"
    }
  ],
  "children": [],
  "spouse": null
}

1.2. 格式化 JSON 数据 (python -m json.tool)

有时，你得到的 JSON 数据可能混在一整行或者没有很好地缩进，如何格式化它，使我们可更方便地查看其内容呢？

利用 Python 的模块 json.tool 可以轻松地完成这个任务，如：

$ echo '{"a": "foo", "b": "bar", "c": "baz"}' | python -m 'json.tool'
{
    "a": "foo",
    "b": "bar",
    "c": "baz"
}

如果 JSON 数据保存在文件中，使用 python -m json.tool file.json 可显示它格式化后的内容，如：

$ cat file.json
{"a": "foo", "b": "bar", "c": "baz"}
$ python -m json.tool file.json
{
    "a": "foo",
    "b": "bar",
    "c": "baz"
}

使用 Python 也可以过滤出 JSON 中某个 Key 对应的 Value，如：

$ echo '{"a": "foo", "b": "bar", "c": "baz"}' | python -c 'import sys, json; print(json.load(sys.stdin)["c"])'
baz

2. JMESPath (JSON Query)

如何在 JSON 数据中查找你想要的信息呢？比如，你想在节 1.1 介绍的例子中查找某人的家里电话号码（例子中为"212 555-1234"），这个任务使用 sed/awk/grep 等工具显得力不从心，这时，我们需要一种专门针对 JSON 的查询语言（如果使用后面介绍的 JMESPath，则通过 phoneNumbers[?type=='home'].number | [0] 可以得到"212 555-1234"）。

JMESPath is a query language for JSON. JMESPath 以其作者 James Saryerwinnie 的名字命名。JMESPath 支持 Python/PHP/Javascript/Ruby/Lua/Go/Java 等很多语言。

注：JMESPath 并不是唯一的 JSON 查询语言，比如还有 JsonPath 等等，其它更多 JSON 查询语言可参见：http://stackoverflow.com/questions/777455/is-there-a-query-language-for-json

2.1. 第一个 JMESPath 程序

下面以 JEMESPath 的 Python 实现 jmespath（jmespath 不是内置模块，需要安装）为例，介绍其简单使用。

>>> import jmespath
>>> path=jmespath.search('b', {"a": "foo", "b": "bar", "c": "baz"})
>>> print(path)
bar

参考：http://jmespath.org/tutorial.html

2.2. 基本查询语法

表 1 是常用的 JMESPath 查询语法及其实例。

Table 1: JMESPath 实例：Basic Expressions
查询条件	JSON 数据	查询条件例子	返回结果
Identifier	{"a": "foo", "b": "bar"}	a	"foo"
Subexpression	{"a": {"b": {"c": "value"}}}	a.b.c	"value"
Index Expressions	["a", "b", "c", "d", "e", "f"]	`[1]`	"b"
Index Expressions	["a", "b", "c", "d", "e", "f"]	`[-1]`	"f"
Slicing [start:stop]	[0, 1, 2, 3, 4, 5, 6, 7, 8]	[0:3]	[0, 1, 2]
Slicing [start:stop]	[0, 1, 2, 3, 4, 5, 6, 7, 8]	[0:-3]	[0, 1, 2, 3, 4, 5]
Slicing [start:stop:step]	[0, 1, 2, 3, 4, 5, 6, 7, 8]	[::2]	[0, 2, 4, 6, 8]
Slicing [start:stop:step]	[0, 1, 2, 3, 4, 5, 6, 7, 8]	[::-2]	[8, 6, 4, 2, 0]
Slicing [start:stop:step]	[0, 1, 2, 3, 4, 5, 6, 7, 8]	[0:8:3]	[0, 3, 6]

2.3. Projections

Projection allows you to apply an expression to a collection of elements. There are five kinds of projections:
(1) List Projections
(2) Slice Projections
(3) Object Projections
(4) Flatten Projections
(5) Filter Projections

2.3.1. List and Slice Projections ([*])

A wildcard expression creates a list projection, which is a projection over a JSON array.

下面是 List projection 的例子。假设有下面 JSON 数据，我们想要得到 people 中的所有的 first 名字。

{
  "people": [
    {"first": "James", "last": "d"},
    {"first": "Jacob", "last": "e"},
    {"first": "Jayden", "last": "f"},
    {"missing": "different"}
  ],
  "foo": {"bar": "baz"}
}

使用 people[*].first 可得到 people 中的所有的 first 名字，即：

people[*].first    ----> [ "James", "Jacob", "Jayden" ]

如果只想得到第 2 个和第 3 个 first 名字，可以这样：

people[1:3].first   ----> [ "Jacob", "Jayden" ]

2.3.2. Object Projections (星号*)

前面介绍的 List projection 仅可应用在 JSON 数组上。使用这节介绍的 Object Projection 可以应用在 JSON 对象上。

You can create an object projection using the * syntax. This will create a list of the values of the JSON object, and project the right hand side of the projection onto the list of values.

如，有下面 JSON 数据：

{
  "ops": {
    "functionA": {"numArgs": 2},
    "functionB": {"numArgs": 3},
    "functionC": {"variadic": true}
  }
}

应用查询条件 ops.*.numArgs 后，可以得到 [ 2, 3 ] ，即：

ops.*.numArgs      ----> [ 2, 3 ]

怎么理解它呢？把 Object projection 分解为 LHS(left hand side)和 RHS(right hand side)，即：LHS 为 ops，RHS 为 numArgs。
当 LHS 执行后，得到下面数组：

[{"numArgs": 2}, {"numArgs": 3}, {"variadic": true}]

再应用 RHS 到上面数组中，得到：

[ 2, 3, null]

而 null 不出现在最终结果中，所以最终结果为：

[ 2, 3 ]

2.3.3. Flatten Projections (空中括号[])

Flatten Projections 用 [] 表示，它有什么用呢？请看下面例子。

假设有下面 JSON 数据：

{
  "reservations": [
    {
      "instances": [
        {"state": "running"},
        {"state": "stopped"}
      ]
    },
    {
      "instances": [
        {"state": "terminated"},
        {"state": "runnning"}
      ]
    }
  ]
}

我们想得到一个列表里包含所有的状态，即想得到 ["running", "stopped", "terminated", "running"] ，怎么办呢？
如果使用前面介绍的 List projection reservations[*].instances[*].state 作为查询会得到 [["running", "stopped"], ["terminated", "running"]] ，它不是想要的结果。
使用 Flatten Projection reservations[].instances[].state 作为查询可以得到 ["running", "stopped", "terminated", "running"] 。

2.3.3.1. Flatten Projection 一次仅操作一层数据

Flatten Projection 它只会操作一层数据（不是递归操作数据）。
例如，有 JOSN 数据：

[
  [0, 1],
  2,
  [3],
  4,
  [5, [6, 7]]
]

做一次 Flatten Projection [] 和做两次 Flatten Projection [][] 的结果分别如下所示：

[]        ---->   [ 0, 1, 2, 3, 4, 5, [ 6, 7 ] ]
[][]      ---->   [ 0, 1, 2, 3, 4, 5, 6, 7 ]

2.3.4. Filter Projections

A filter expression is defined for an array and has the general form LHS [? <expression> <comparator> <expression>] RHS.

例如，想要在下面 JSON 数据中找到所有 state 为 runing 的 machines 的名字。

{
  "machines": [
    {"name": "a", "state": "running"},
    {"name": "b", "state": "stopped"},
    {"name": "b", "state": "running"}
  ]
}

使用 machines[?state=='running'].name 能实现目标，即：

machines[?state=='running'].name      ----> [ "a", "b" ]

2.4. Pipe Expressions

下面例子中，我们想得到 people 中第一个人的 first 名字（即"James"），怎么办呢？

{
  "people": [
    {"first": "James", "last": "d"},
    {"first": "Jacob", "last": "e"},
    {"first": "Jayden", "last": "f"},
    {"missing": "different"}
  ],
  "foo": {"bar": "baz"}
}

解决办法是使用 Pipe Expressions，如：

people[*].first | [0]      ----> "James"
people[*].first | [1]      ----> "Jacob"

2.5. MultiSelect（创建 JSON）

前面介绍的 JMESPath 表达式都是从 JSON 中找到你感兴趣的部分，而下面将要介绍的 multiselect lists 和 multiselect hashes 可以创建 JSON 元素。

下面是 MultiSelect List 和 MultiSelect Hash 的例子。

假设有 JSON 数据：

{
  "people": [
    {
      "name": "a",
      "state": {"name": "up"}
    },
    {
      "name": "b",
      "state": {"name": "down"}
    },
    {
      "name": "c",
      "state": {"name": "up"}
    }
  ]
}

应用 MultiSelect List 表达式 people[].[name, state.name] 后，可以得到：

[
  [
    "a",
    "up"
  ],
  [
    "b",
    "down"
  ],
  [
    "c",
    "up"
  ]
]

应用 MultiSelect Hash 表达式 people[].{myName: name, myState: state.name} 后，可以得到：

[
  {
    "myName": "a",
    "myState": "up"
  },
  {
    "myName": "b",
    "myState": "down"
  },
  {
    "myName": "c",
    "myState": "up"
  }
]

参考：http://jmespath.org/tutorial.html#multiselect

2.6. Functions

要得到下面 JSON 数据中 people 的数量，怎么办？

{
  "people": [
    {
      "name": "b",
      "age": 30,
      "state": {"name": "up"}
    },
    {
      "name": "a",
      "age": 50,
      "state": {"name": "down"}
    },
    {
      "name": "c",
      "age": 40,
      "state": {"name": "up"}
    }
  ]
}

可以使用函数 length，即：

length(people)       ----> 3

参考：
http://jmespath.org/specification.html#functions
http://jmespath.org/specification.html#builtin-functions

2.6.1. Filter expression 中使用函数

直接看例子，我们想找到下面 JSON 数据中 myarray 中所有包含关键字 foo 的元素。

{
  "myarray": [
    "foo",
    "foobar",
    "barfoo",
    "bar",
    "baz",
    "barbaz",
    "barfoobaz"
  ]
}

用 myarray[?contains(@, 'foo') == `true`] 可以实现上面的任务，即：

myarray[?contains(@, 'foo') == `true`]      ---->  [ "foo", "foobar", "barfoo", "barfoobaz" ]

The @ character in the example above refers to the current element being evaluated in myarray.