TOML简明教程

TOMLTom’s Obvious, Minimal Language) 是由Tom Preston-Werner设计的一门语言。可以将它简单理解为一种数据描述格式,类比xml, yaml, json, ini等。它主要的设计目标是对人类友好(_有明显语义、容易阅读_)的小型配置文件格式。有了这个前提就没有必要去“杠”谁比谁更好,想要“一门技术吃天下”,选择恰当的工具应用在恰当的地方即可。一些复杂的IDE通常适用xml格式作为配置文件,例如AndroidStudio,XCode;知名开源软件Docker, Drone等适用yaml作为配置文件;json在网络通信中被广泛使用; Cargo, InfluxDB, Traefik等使用toml。

toml的当前的最新版本是v0.5.0, 已可认为足够稳定。但因为我要使用的Go和ruby实现还只支持到v0.4.0,所以本文介绍v0.4.0的语言规范以及Go和ruby实现的使用。关于实现情况可以查看wiki

v0.4.0

语言规范

  • 区分大小写
  • 只支持UTF-8编码
  • 空白字符只包括 tab (0x09) 和 空格 (0x20)
  • 换行支持 LF (0x0A) 和 CRLF (0x0D0A)

注释

以# 开头到行尾部分,没有多行注释符

# I am a comment. Hear me roar. Roar.
key = "value" # Yeah, you can do this.

字符串

字符串分为基础 (basic) 字符串和字面量 (literal) 字符串两类,分别用双引号 和单引号 作为分隔符。基础字符串中转义会生效,字面量字符串的最终结果即为其书写形式。每类字符串又支持单行和多行模式,多行模式分别使用连续的三个分隔符作为字符串的起始和结尾,紧接开始分隔符的换行会被忽略

str0 = """
Roses are red
Violets are blue"""
# 'Roles are red'之前的换行符会被忽略,这非常有用
基础字符串

长字符串可以用 *\* 连接多行书写,提高可读性,其后直到下一个非空白字符的所有空白字符和换行符都会被忽略掉。多行字符串中可以出现双引号,不需要转义,除非连续出现可能导致字符串在语义上提前结束。

# On a Unix system, the above multi-line string will most likely be the same as:
str1 = "Roses are red\nViolets are blue"

# On a Windows system, it will most likely be equivalent to:
str2 = "Roses are red\r\nViolets are blue"

# The following strings are byte-for-byte equivalent:
str3 = "The quick brown fox jumps over the lazy dog."

str4 = """
The quick brown \


  fox jumps over \
    the lazy dog."""

str3 = """\
       The quick brown \
       fox jumps over \
       the lazy dog.\
       """
字面量字符串

不允许出现控制字符(tab除外),如果要包括任意二进制数据,建议使用base64编码。

# What you see is what you get.
winpath  = 'C:\Users\nodejs\templates'
winpath2 = '\\ServerX\admin$\system32\'
quoted   = 'Tom "Dubs" Preston-Werner'
regex    = '<\i\c*\s*>'
regex2 = '''I [dw]on't need \d{2} apples'''
lines  = '''
The first newline is
trimmed in raw strings.
   All other whitespace
   is preserved.
'''

整型

int1 = +99
int2 = 42
int3 = 0
int4 = -17

# 大的数字可以使用【下划线】分隔以增加可读性
int5 = 1_000
int6 = 5_349_221
int7 = 1_2_3_4_5     # VALID but discouraged

# hexadecimal with prefix `0x`
hex1 = 0xDEADBEEF
hex2 = 0xdeadbeef
hex3 = 0xdead_beef

# octal with prefix `0o`
oct1 = 0o01234567
oct2 = 0o755 # useful for Unix file permissions

# binary with prefix `0b`
bin1 = 0b11010110

浮点型

遵从IEEE 754规范

# fractional
flt1 = +1.0
flt2 = 3.1415
flt3 = -0.01

# exponent
flt4 = 5e+22
flt5 = 1e6
flt6 = -2E-2

# both
flt7 = 6.626e-34

# use underscore to enhance readability
flt8 = 224_617.445_991_228

# infinity
sf1 = inf  # positive infinity
sf2 = +inf # positive infinity
sf3 = -inf # negative infinity

# not a number
sf4 = nan  # actual sNaN/qNaN encoding is implementation specific
sf5 = +nan # same as `nan`
sf6 = -nan # valid, actual encoding is implementation specific

布尔型

bool1 = true
bool2 = false

日期和时间

遵从RFC 3339标准,时间秒的小数点部分(如下的ldt6的 .999999 部分)的精度是实现相关的,但至少是毫秒级,无法支持的精度部分会被截断。

odt1 = 1979-05-27T07:32:00Z
odt2 = 1979-05-27T00:32:00-07:00
odt3 = 1979-05-27T00:32:00.999999-07:00

数组

方括号包围的值序列,元素之间用都好分隔,中间的空白符会被忽略。数据类型必须一致。” 尾逗号 “是允许的。

arr1 = [ 1, 2, 3 ]
arr2 = [ "red", "yellow", "green" ]
arr3 = [ [ 1, 2 ], [3, 4, 5] ]
arr4 = [ "all", 'strings', """are the same""", '''type''']
arr5 = [ [ 1, 2 ], ["a", "b", "c"] ] # this is ok

arr6 = [ 1, 2.0 ] # INVALID

arr7 = [
  1, 2, 3
]

arr8 = [
  1,
  2, # this is ok
]

字典

字典是键值对的集合,方括号中加名字单独一行声明字典的开始,直到下一个字典声明或EOF结束。键可用的字符集为 (A-Za-z0-9_-) ,如果需要用到其它字符需要用引号,推荐尽可能使用 裸键 。键是无序的。

[table]
key = "value"
bare_key = "value"
bare-key = "value"

"127.0.0.1" = "value"
"character encoding" = "value"
"ʎǝʞ" = "value"

字典可以通过 . 号嵌套,可以直接指定内层嵌套表,而不需要先依次定义外层表:

[dog."tater.man"]
type = "pug"
# json representation of above is : { "dog": { "tater.man": {"type": "pug" } } }

[a.b.c]          # this is best practice
[ d.e.f ]        # same as [d.e.f]
[ g .  h  . i ]  # same as [g.h.i]
[ j . "ʞ" . l ]  # same as [j."ʞ".l]

# [x] you
# [x.y] don't
# [x.y.z] need these
[x.y.z.w] # for this to work


允许定义空表:

[a] # empty table

[b]
d = 1

可以先设置嵌套表,再设置上层表:

[a.b]
c = 1

[a]
d = 2

键和表都不允许重复定义

# DO NOT DO THIS
[a]
b = 1

[a]
c = 2

# DO NOT DO THIS EITHER
[a]
b = 1

[a.b]
c = 2

内联(inline)字典

提供一种紧凑的字典表达方式,例如:

name = { first = "Tom", last = "Preston-Werner" }
point = { x = 1, y = 2 }

# above is identical to the following

[name]
first = "Tom"
last = "Preston-Werner"

[point]
x = 1
y = 2

字典数组

[[fruit]]
  name = "apple"

  [fruit.physical]
    color = "red"
    shape = "round"

  [[fruit.variety]]
    name = "red delicious"
    
  [[fruite.variety]]
  
  [[fruit.variety]]
    name = "granny smith"

[[fruit]]
  name = "banana"

  [[fruit.variety]]
    name = "plantain"

以上结构的等价json表示为:

{
  "fruit": [
    {
      "name": "apple",
      "physical": {
        "color": "red",
        "shape": "round"
      },
      "variety": [
        { "name": "red delicious" },
        { },
        { "name": "granny smith" }
      ]
    },
    {
      "name": "banana",
      "variety": [
        { "name": "plantain" }
      ]
    }
  ]
}

Go实现

Go (@thompelletier) - https://github.com/pelletier/go-toml

var blob = []byte(`
	[postgres]
	user = "pelletier"
	password = "mypassword"`)

type Postgres struct {
	User     string
	Password string
}

type Config struct {
	Postgres Postgres
}

func get_toml() {
	config, err := toml.LoadBytes(blob)
	if err != nil {
		log.Fatal(err)
	}

	user := config.Get("postgres.user").(string)
	log.Println("user:", user)

	pgConfig := config.Get("postgres").(*toml.Tree)
	if pgConfig != nil {
		password := pgConfig.Get("password").(string)
		log.Println("password:", password)
	}
}

func unmarshal_toml() {
	config := Config{}
	toml.Unmarshal(blob, &config)
	log.Printf("%+v\n", config)
}

Ruby实现

Ruby (@eMancu) - https://github.com/eMancu/toml-rb (toml-rb gem)

gem install tomb-rb

require 'toml-rb'

# From a stream!
stream = <<-EOS
	[postgres]
	user = "pelletier"
	password = "mypassword"
EOS
puts TomlRB.parse(stream)
# => {"title"=>"wow!", "awesome"=>{"you"=>true, "others"=>false}}

Contents