Nim 语言教程(第一部分)

作者: Andreas Rumpf
版本: |nimversion|

介绍

"Der Mensch ist doch ein Augentier -- schöne Dinge wünsch ich mir."

这篇文档是 Nim 编程语言的教程。 本教程将会假定你知道一些编程基础概念,像变量、类型和声明。 更高级的语言特性可以查看这篇 手册

第一个程序

让我们来从一个修改过的"hello world"程序开始吧:

# 这是一个注释
echo("What's your name? ")
var name: string = readLine(stdin)
echo("Hi, ", name, "!")

将代码保存为"greetings.nim",然后编译并运行它:

nim compile --run greetings.nim

若含有 --run 参数 , Nim 将会在编译后自动执行程序。你可以在文件名后加入需要传递给你的程序的参数:

nim compile --run greetings.nim arg1 arg2

常用的命令和参数都有缩写,所以你也可以这样写:

nim c -r greetings.nim

若要编译可发行版本请使用:

nim c -d:release greetings.nim

默认Nim编译器会生成大量运行时检查以便你调试时使用。使用 -d:release 参数将这些检查 关闭并开启优化

虽然你应该很容易理解这个程序在做什么,但是我还是来解释一下它的语法吧: 那些没有缩进的语句会在程序启动时执行。缩进是 Nim 用来对语句分组的方法。缩进只能用空格而不能用TAB键代替。

字符串必须用双引号括起来。 var 语句定义了一个名为 name ,类型为 string 的新变量。 这个变量被赋予了 readLine 方法所返回的值。 因为编译器知道 readLine 方法返回了一个 string , 所以你也可以不用声明这个变量的类型(这就是本地类型推断),直接写成这样:

var name = readLine(stdin)

要注意,这基本上是 Nim 的唯一一种类型推断方法(这样可以很好的平衡简洁性和可读性)。

"hello world"程序包含了一些编译器已知的标识符,比如 echoreadLine 。 这些内置的标识符在 system 模块中被声明,而 system 模块是默认被隐藏 import 在所有模块里的。

语法规则

让我们来看看Nim的词汇元素的更多细节吧: 像其他语言一样,Nim由字符串、标识符、关键词、注释、运算符和其他标点符号组成。

字符串和字符

字符串被双引号包围;字符用单引号包围。 特殊字符使用 \ 来转义: \n 代表换行, \t 代表水平制表符等等。 同时我们也有原始字符串:

r"C:\program files\nim"

在原始字符串里, \ 不会被转义。

第三个也是最后一个用于表达字符串的方法为长字符串。 长字符串被6个双引号包围: """ ... """ ,可以跨多行。 同时 \ 也不会被转义。所以在嵌入 HTML 等代码时十分方便。

Comments

Comments start anywhere outside a string or character literal with the hash character #. Documentation comments start with ##:

# A comment.

var myVariable: int ## a documentation comment

Documentation comments are tokens; they are only allowed at certain places in the input file as they belong to the syntax tree! This feature enables simpler documentation generators.

You can also use the discard statement together with long string literals to create block comments:

discard """ You can have any Nim code text commented
out inside this with no indentation restrictions.
      yes("May I ask a pointless question?") """

Numbers

Numerical literals are written as in most other languages. As a special twist, underscores are allowed for better readability: 1_000_000 (one million). A number that contains a dot (or 'e' or 'E') is a floating point literal: 1.0e9 (one billion). Hexadecimal literals are prefixed with 0x, binary literals with 0b and octal literals with 0o. A leading zero alone does not produce an octal.

var 语句

var 语句可以声明一个新的局部或全局变量:

var x, y: int # 声明 x 和 y,并且类型为 ``int``

var 后加上缩进可以用于声明一整段的变量:

var
  x, y: int
  # 注释也可以出现在这里
  a, b, c: string

The assignment statement

The assignment statement assigns a new value to a variable or more generally to a storage location:

var x = "abc" # introduces a new variable `x` and assigns a value to it
x = "xyz"     # assigns a new value to `x`

= is the assignment operator. The assignment operator cannot be overloaded, overwritten or forbidden, but this might change in a future version of Nim. You can declare multiple variables with a single assignment statement and all the variables will have the same value:

var x, y = 3  # assigns 3 to the variables `x` and `y`
echo "x ", x  # outputs "x 3"
echo "y ", y  # outputs "y 3"
x = 42        # changes `x` to 42 without changing `y`
echo "x ", x  # outputs "x 42"
echo "y ", y  # outputs "y 3"

Note that declaring multiple variables with a single assignment which calls a procedure can have unexpected results: the compiler will unroll the assignments and end up calling the procedure several times. If the result of the procedure depends on side effects, your variables may end up having different values! For safety use only constant values.

Constants

Constants are symbols which are bound to a value. The constant's value cannot change. The compiler must be able to evaluate the expression in a constant declaration at compile time:

const x = "abc" # the constant x contains the string "abc"

Indentation can be used after the const keyword to list a whole section of constants:

const
  x = 1
  # a comment can occur here too
  y = 2
  z = y + 5 # computations are possible

The let statement

The let statement works like the var statement but the declared symbols are single assignment variables: After the initialization their value cannot change:

let x = "abc" # introduces a new variable `x` and binds a value to it
x = "xyz"     # Illegal: assignment to `x`

The difference between let and const is: let introduces a variable that can not be re-assigned, const means "enforce compile time evaluation and put it into a data section":

const input = readLine(stdin) # Error: constant expression expected
let input = readLine(stdin)   # works

Control flow statements

The greetings program consists of 3 statements that are executed sequentially. Only the most primitive programs can get away with that: branching and looping are needed too.

If statement

The if statement is one way to branch the control flow:

let name = readLine(stdin)
if name == "":
  echo("Poor soul, you lost your name?")
elif name == "name":
  echo("Very funny, your name is name.")
else:
  echo("Hi, ", name, "!")

There can be zero or more elif parts, and the else part is optional. The keyword elif is short for else if, and is useful to avoid excessive indentation. (The "" is the empty string. It contains no characters.)

Case statement

Another way to branch is provided by the case statement. A case statement is a multi-branch:

let name = readLine(stdin)
case name
of "":
  echo("Poor soul, you lost your name?")
of "name":
  echo("Very funny, your name is name.")
of "Dave", "Frank":
  echo("Cool name!")
else:
  echo("Hi, ", name, "!")

As it can be seen, for an of branch a comma separated list of values is also allowed.

The case statement can deal with integers, other ordinal types and strings. (What an ordinal type is will be explained soon.) For integers or other ordinal types value ranges are also possible:

# this statement will be explained later:
from strutils import parseInt

echo("A number please: ")
let n = parseInt(readLine(stdin))
case n
of 0..2, 4..7: echo("The number is in the set: {0, 1, 2, 4, 5, 6, 7}")
of 3, 8: echo("The number is 3 or 8")

However, the above code does not compile: the reason is that you have to cover every value that n may contain, but the code only handles the values 0..8. Since it is not very practical to list every other possible integer (though it is possible thanks to the range notation), we fix this by telling the compiler that for every other value nothing should be done:

...
case n
of 0..2, 4..7: echo("The number is in the set: {0, 1, 2, 4, 5, 6, 7}")
of 3, 8: echo("The number is 3 or 8")
else: discard

The empty discard statement is a do nothing statement. The compiler knows that a case statement with an else part cannot fail and thus the error disappears. Note that it is impossible to cover all possible string values: that is why string cases always need an else branch.

In general the case statement is used for subrange types or enumerations where it is of great help that the compiler checks that you covered any possible value.

While statement

The while statement is a simple looping construct:

echo("What's your name? ")
var name = readLine(stdin)
while name == "":
  echo("Please tell me your name: ")
  name = readLine(stdin)
  # no ``var``, because we do not declare a new variable here

The example uses a while loop to keep asking the user for his name, as long as he types in nothing (only presses RETURN).

For statement

The for statement is a construct to loop over any element an iterator provides. The example uses the built-in countup iterator:

echo("Counting to ten: ")
for i in countup(1, 10):
  echo($i)
# --> Outputs 1 2 3 4 5 6 7 8 9 10 on different lines

The built-in $ operator turns an integer (int) and many other types into a string. The variable i is implicitly declared by the for loop and has the type int, because that is what countup returns. i runs through the values 1, 2, .., 10. Each value is echo-ed. This code does the same:

echo("Counting to 10: ")
var i = 1
while i <= 10:
  echo($i)
  inc(i) # increment i by 1
# --> Outputs 1 2 3 4 5 6 7 8 9 10 on different lines

Counting down can be achieved as easily (but is less often needed):

echo("Counting down from 10 to 1: ")
for i in countdown(10, 1):
  echo($i)
# --> Outputs 10 9 8 7 6 5 4 3 2 1 on different lines

Since counting up occurs so often in programs, Nim also has a .. iterator that does the same:

for i in 1..10:
  ...

Scopes and the block statement

Control flow statements have a feature not covered yet: they open a new scope. This means that in the following example, x is not accessible outside the loop:

while false:
  var x = "hi"
echo(x) # does not work

A while (for) statement introduces an implicit block. Identifiers are only visible within the block they have been declared. The block statement can be used to open a new block explicitly:

block myblock:
  var x = "hi"
echo(x) # does not work either

The block's label (myblock in the example) is optional.

Break statement

A block can be left prematurely with a break statement. The break statement can leave a while, for, or a block statement. It leaves the innermost construct, unless a label of a block is given:

block myblock:
  echo("entering block")
  while true:
    echo("looping")
    break # leaves the loop, but not the block
  echo("still in block")

block myblock2:
  echo("entering block")
  while true:
    echo("looping")
    break myblock2 # leaves the block (and the loop)
  echo("still in block")

Continue statement

Like in many other programming languages, a continue statement starts the next iteration immediately:

while true:
  let x = readLine(stdin)
  if x == "": continue
  echo(x)

When statement

Example:

when system.hostOS == "windows":
  echo("running on Windows!")
elif system.hostOS == "linux":
  echo("running on Linux!")
elif system.hostOS == "macosx":
  echo("running on Mac OS X!")
else:
  echo("unknown operating system")

The when statement is almost identical to the if statement with some differences:

  • Each condition has to be a constant expression since it is evaluated by the compiler.
  • The statements within a branch do not open a new scope.
  • The compiler checks the semantics and produces code only for the statements that belong to the first condition that evaluates to true.

The when statement is useful for writing platform specific code, similar to the #ifdef construct in the C programming language.

Note: To comment out a large piece of code, it is often better to use a when false: statement than to use real comments. This way nesting is possible.

Statements and indentation

Now that we covered the basic control flow statements, let's return to Nim indentation rules.

In Nim there is a distinction between simple statements and complex statements. Simple statements cannot contain other statements: Assignment, procedure calls or the return statement belong to the simple statements. Complex statements like if, when, for, while can contain other statements. To avoid ambiguities, complex statements always have to be indented, but single simple statements do not:

# no indentation needed for single assignment statement:
if x: x = false

# indentation needed for nested if statement:
if x:
  if y:
    y = false
  else:
    y = true

# indentation needed, because two statements follow the condition:
if x:
  x = false
  y = false

Expressions are parts of a statement which usually result in a value. The condition in an if statement is an example for an expression. Expressions can contain indentation at certain places for better readability:

if thisIsaLongCondition() and
    thisIsAnotherLongCondition(1,
       2, 3, 4):
  x = true

As a rule of thumb, indentation within expressions is allowed after operators, an open parenthesis and after commas.

With parenthesis and semicolons (;) you can use statements where only an expression is allowed:

# computes fac(4) at compile time:
const fac4 = (var x = 1; for i in 1..4: x *= i; x)

Procedures

To define new commands like echo and readLine in the examples, the concept of a procedure is needed. (Some languages call them methods or functions.) In Nim new procedures are defined with the proc keyword:

proc yes(question: string): bool =
  echo(question, " (y/n)")
  while true:
    case readLine(stdin)
    of "y", "Y", "yes", "Yes": return true
    of "n", "N", "no", "No": return false
    else: echo("Please be clear: yes or no")

if yes("Should I delete all your important files?"):
  echo("I'm sorry Dave, I'm afraid I can't do that.")
else:
  echo("I think you know what the problem is just as well as I do.")

This example shows a procedure named yes that asks the user a question and returns true if he answered "yes" (or something similar) and returns false if he answered "no" (or something similar). A return statement leaves the procedure (and therefore the while loop) immediately. The (question: string): bool syntax describes that the procedure expects a parameter named question of type string and returns a value of type bool. Bool is a built-in type: the only valid values for bool are true and false. The conditions in if or while statements should be of the type bool.

Some terminology: in the example question is called a (formal) parameter, "Should I..." is called an argument that is passed to this parameter.

Result variable

A procedure that returns a value has an implicit result variable declared that represents the return value. A return statement with no expression is a shorthand for return result. The result value is always returned automatically at the end a procedure if there is no return statement at the exit.

proc sumTillNegative(x: varargs[int]): int =
  for i in x:
    if i < 0:
      return
    result = result + i

echo sumTillNegative() # echos 0
echo sumTillNegative(3, 4, 5) # echos 12
echo sumTillNegative(3, 4 , -1 , 6) # echos 7

The result variable is already implicitly declared at the start of the function, so declaring it again with 'var result', for example, would shadow it with a normal variable of the same name. The result variable is also already initialised with the type's default value. Note that referential data types will be nil at the start of the procedure, and thus may require manual initialisation.

Parameters

Parameters are constant in the procedure body. By default, their value cannot be changed because this allows the compiler to implement parameter passing in the most efficient way. If a mutable variable is needed inside the procedure, it has to be declared with var in the procedure body. Shadowing the parameter name is possible, and actually an idiom:

proc printSeq(s: seq, nprinted: int = -1) =
  var nprinted = if nprinted == -1: s.len else: min(nprinted, s.len)
  for i in 0 .. <nprinted:
    echo s[i]

If the procedure needs to modify the argument for the caller, a var parameter can be used:

proc divmod(a, b: int; res, remainder: var int) =
  res = a div b        # integer division
  remainder = a mod b  # integer modulo operation

var
  x, y: int
divmod(8, 5, x, y) # modifies x and y
echo(x)
echo(y)

In the example, res and remainder are var parameters. Var parameters can be modified by the procedure and the changes are visible to the caller. Note that the above example would better make use of a tuple as a return value instead of using var parameters.

Discard statement

To call a procedure that returns a value just for its side effects and ignoring its return value, a discard statement has to be used. Nim does not allow to silently throw away a return value:

discard yes("May I ask a pointless question?")

The return value can be ignored implicitly if the called proc/iterator has been declared with the discardable pragma:

proc p(x, y: int): int {.discardable.} =
  return x + y

p(3, 4) # now valid

The discard statement can also be used to create block comments as described in the Comments section.

Named arguments

Often a procedure has many parameters and it is not clear in which order the parameters appear. This is especially true for procedures that construct a complex data type. Therefore the arguments to a procedure can be named, so that it is clear which argument belongs to which parameter:

proc createWindow(x, y, width, height: int; title: string;
                  show: bool): Window =
   ...

var w = createWindow(show = true, title = "My Application",
                     x = 0, y = 0, height = 600, width = 800)

Now that we use named arguments to call createWindow the argument order does not matter anymore. Mixing named arguments with ordered arguments is also possible, but not very readable:

var w = createWindow(0, 0, title = "My Application",
                     height = 600, width = 800, true)

The compiler checks that each parameter receives exactly one argument.

Default values

To make the createWindow proc easier to use it should provide default values, these are values that are used as arguments if the caller does not specify them:

proc createWindow(x = 0, y = 0, width = 500, height = 700,
                  title = "unknown",
                  show = true): Window =
   ...

var w = createWindow(title = "My Application", height = 600, width = 800)

Now the call to createWindow only needs to set the values that differ from the defaults.

Note that type inference works for parameters with default values; there is no need to write title: string = "unknown", for example.

Overloaded procedures

Nim provides the ability to overload procedures similar to C++:

proc toString(x: int): string = ...
proc toString(x: bool): string =
  if x: result = "true"
  else: result = "false"

echo(toString(13))   # calls the toString(x: int) proc
echo(toString(true)) # calls the toString(x: bool) proc

(Note that toString is usually the $ operator in Nim.) The compiler chooses the most appropriate proc for the toString calls. How this overloading resolution algorithm works exactly is not discussed here (it will be specified in the manual soon). However, it does not lead to nasty surprises and is based on a quite simple unification algorithm. Ambiguous calls are reported as errors.

Operators

The Nim library makes heavy use of overloading - one reason for this is that each operator like + is a just an overloaded proc. The parser lets you use operators in infix notation (a + b) or prefix notation (+ a). An infix operator always receives two arguments, a prefix operator always one. Postfix operators are not possible, because this would be ambiguous: does a @ @ b mean (a) @ (@b) or (a@) @ (b)? It always means (a) @ (@b), because there are no postfix operators in Nim.

Apart from a few built-in keyword operators such as and, or, not, operators always consist of these characters: + - * \ / < > = @ $ ~ & % ! ? ^ . |

User defined operators are allowed. Nothing stops you from defining your own @!?+~ operator, but readability can suffer.

The operator's precedence is determined by its first character. The details can be found in the manual.

To define a new operator enclose the operator in backticks "``":

proc `$` (x: myDataType): string = ...
# now the $ operator also works with myDataType, overloading resolution
# ensures that $ works for built-in types just like before

The "``" notation can also be used to call an operator just like any other procedure:

if `==`( `+`(3, 4), 7): echo("True")

Forward declarations

Every variable, procedure, etc. needs to be declared before it can be used. (The reason for this is that it is non-trivial to do better than that in a language that supports meta programming as extensively as Nim does.) However, this cannot be done for mutually recursive procedures:

# forward declaration:
proc even(n: int): bool

proc odd(n: int): bool =
  n == 1 or even(n-1)

proc even(n: int): bool =
  n == 0 or odd(n-1)

Here odd depends on even and vice versa. Thus even needs to be introduced to the compiler before it is completely defined. The syntax for such a forward declaration is simple: just omit the = and the procedure's body.

Later versions of the language will weaken the requirements for forward declarations.

The example also shows that a proc's body can consist of a single expression whose value is then returned implicitly.

Iterators

Let's return to the boring counting example:

echo("Counting to ten: ")
for i in countup(1, 10):
  echo($i)

Can a countup proc be written that supports this loop? Lets try:

proc countup(a, b: int): int =
  var res = a
  while res <= b:
    return res
    inc(res)

However, this does not work. The problem is that the procedure should not only return, but return and continue after an iteration has finished. This return and continue is called a yield statement. Now the only thing left to do is to replace the proc keyword by iterator and there it is - our first iterator:

iterator countup(a, b: int): int =
  var res = a
  while res <= b:
    yield res
    inc(res)

Iterators look very similar to procedures, but there are several important differences:

  • Iterators can only be called from for loops.
  • Iterators cannot contain a return statement and procs cannot contain a yield statement.
  • Iterators have no implicit result variable.
  • Iterators do not support recursion.
  • Iterators cannot be forward declared, because the compiler must be able to inline an iterator. (This restriction will be gone in a future version of the compiler.)

However, you can also use a closure iterator to get a different set of restrictions. See first class iterators for details. Iterators can have the same name and parameters as a proc, essentially they have their own namespace. Therefore it is common practice to wrap iterators in procs of the same name which accumulate the result of the iterator and return it as a sequence, like split from the strutils module.

Basic types

This section deals with the basic built-in types and the operations that are available for them in detail.

Booleans

The boolean type is named bool in Nim and consists of the two pre-defined values true and false. Conditions in while, if, elif, when statements need to be of type bool.

The operators not, and, or, xor, <, <=, >, >=, !=, == are defined for the bool type. The and and or operators perform short-cut evaluation. Example:

while p != nil and p.name != "xyz":
  # p.name is not evaluated if p == nil
  p = p.next

Characters

The character type is named char in Nim. Its size is one byte. Thus it cannot represent an UTF-8 character, but a part of it. The reason for this is efficiency: for the overwhelming majority of use-cases, the resulting programs will still handle UTF-8 properly as UTF-8 was specially designed for this. Character literals are enclosed in single quotes.

Chars can be compared with the ==, <, <=, >, >= operators. The $ operator converts a char to a string. Chars cannot be mixed with integers; to get the ordinal value of a char use the ord proc. Converting from an integer to a char is done with the chr proc.

Strings

String variables in Nim are mutable, so appending to a string is quite efficient. Strings in Nim are both zero-terminated and have a length field. One can retrieve a string's length with the builtin len procedure; the length never counts the terminating zero. Accessing the terminating zero is no error and often leads to simpler code:

if s[i] == 'a' and s[i+1] == 'b':
  # no need to check whether ``i < len(s)``!
  ...

The assignment operator for strings copies the string. You can use the & operator to concatenate strings and add to append to a string.

Strings are compared by their lexicographical order. All comparison operators are available. Per convention, all strings are UTF-8 strings, but this is not enforced. For example, when reading strings from binary files, they are merely a sequence of bytes. The index operation s[i] means the i-th char of s, not the i-th unichar.

String variables are initialized with a special value, called nil. However, most string operations cannot deal with nil (leading to an exception being raised) for performance reasons. One should use empty strings "" rather than nil as the empty value. But "" often creates a string object on the heap, so there is a trade-off to be made here.

Integers

Nim has these integer types built-in: int int8 int16 int32 int64 uint uint8 uint16 uint32 uint64.

The default integer type is int. Integer literals can have a type suffix to mark them to be of another integer type:

let
  x = 0     # x is of type ``int``
  y = 0'i8  # y is of type ``int8``
  z = 0'i64 # z is of type ``int64``
  u = 0'u   # u is of type ``uint``

Most often integers are used for counting objects that reside in memory, so int has the same size as a pointer.

The common operators + - * div mod < <= == != > >= are defined for integers. The and or xor not operators are defined for integers too and provide bitwise operations. Left bit shifting is done with the shl, right shifting with the shr operator. Bit shifting operators always treat their arguments as unsigned. For arithmetic bit shifts ordinary multiplication or division can be used.

Unsigned operations all wrap around; they cannot lead to over- or underflow errors.

Automatic type conversion is performed in expressions where different kinds of integer types are used. However, if the type conversion loses information, the EOutOfRange exception is raised (if the error cannot be detected at compile time).

Floats

Nim has these floating point types built-in: float float32 float64.

The default float type is float. In the current implementation, float is always 64 bit wide.

Float literals can have a type suffix to mark them to be of another float type:

var
  x = 0.0      # x is of type ``float``
  y = 0.0'f32  # y is of type ``float32``
  z = 0.0'f64  # z is of type ``float64``

The common operators + - * / < <= == != > >= are defined for floats and follow the IEEE standard.

Automatic type conversion in expressions with different kinds of floating point types is performed: the smaller type is converted to the larger. Integer types are not converted to floating point types automatically and vice versa. The toInt and toFloat procs can be used for these conversions.

Type Conversion

Conversion between basic types in nim is performed by using the type as a function:

var
  x: int32 = 1.int32   # same as calling int32(1)
  y: int8  = int8('a') # 'a' == 97'i8
  z: float = 2.5       # int(2.5) rounds down to 2
  sum: int = int(x) + int(y) + int(z) # sum == 100

Internal type representation

As mentioned earlier, the built-in $ (stringify) operator turns any basic type into a string, which you can then print to the screen with the echo proc. However, advanced types, or types you may define yourself won't work with the $ operator until you define one for them. Sometimes you just want to debug the current value of a complex type without having to write its $ operator. You can use then the repr proc which works with any type and even complex data graphs with cycles. The following example shows that even for basic types there is a difference between the $ and repr outputs:

var
  myBool = true
  myCharacter = 'n'
  myString = "nim"
  myInteger = 42
  myFloat = 3.14
echo($myBool, ":", repr(myBool))
# --> true:true
echo($myCharacter, ":", repr(myCharacter))
# --> n:'n'
echo($myString, ":", repr(myString))
# --> nim:0x10fa8c050"nim"
echo($myInteger, ":", repr(myInteger))
# --> 42:42
echo($myFloat, ":", repr(myFloat))
# --> 3.1400000000000001e+00:3.1400000000000001e+00

Advanced types

In Nim new types can be defined within a type statement:

type
  biggestInt = int64      # biggest integer type that is available
  biggestFloat = float64  # biggest float type that is available

Enumeration and object types cannot be defined on the fly, but only within a type statement.

Enumerations

A variable of an enumeration type can only be assigned a value of a limited set. This set consists of ordered symbols. Each symbol is mapped to an integer value internally. The first symbol is represented at runtime by 0, the second by 1 and so on. Example:

type
  Direction = enum
    north, east, south, west

var x = south      # `x` is of type `Direction`; its value is `south`
echo($x)           # writes "south" to `stdout`

All comparison operators can be used with enumeration types.

An enumeration's symbol can be qualified to avoid ambiguities: Direction.south.

The $ operator can convert any enumeration value to its name, the ord proc to its underlying integer value.

For better interfacing to other programming languages, the symbols of enum types can be assigned an explicit ordinal value. However, the ordinal values have to be in ascending order. A symbol whose ordinal value is not explicitly given is assigned the value of the previous symbol + 1.

An explicit ordered enum can have holes:

type
  MyEnum = enum
    a = 2, b = 4, c = 89

Ordinal types

Enumerations without holes, integer types, char and bool (and subranges) are called ordinal types. Ordinal types have quite a few special operations:

OperationComment
ord(x)returns the integer value that is used to represent x's value
inc(x)increments x by one
inc(x, n)increments x by n; n is an integer
dec(x)decrements x by one
dec(x, n)decrements x by n; n is an integer
succ(x)returns the successor of x
succ(x, n)returns the n'th successor of x
pred(x)returns the predecessor of x
pred(x, n)returns the n'th predecessor of x

The inc, dec, succ and pred operations can fail by raising an EOutOfRange or EOverflow exception. (If the code has been compiled with the proper runtime checks turned on.)

Subranges

A subrange type is a range of values from an integer or enumeration type (the base type). Example:

type
  Subrange = range[0..5]

Subrange is a subrange of int which can only hold the values 0 to 5. Assigning any other value to a variable of type Subrange is a compile-time or runtime error. Assignments from the base type to one of its subrange types (and vice versa) are allowed.

The system module defines the important Natural type as range[0..high(int)] (high returns the maximal value). Other programming languages mandate the usage of unsigned integers for natural numbers. This is often wrong: you don't want unsigned arithmetic (which wraps around) just because the numbers cannot be negative. Nim's Natural type helps to avoid this common programming error.

Sets

The set type models the mathematical notion of a set. The set's basetype can only be an ordinal type. The reason is that sets are implemented as high performance bit vectors.

Sets can be constructed via the set constructor: {} is the empty set. The empty set is type compatible with any concrete set type. The constructor can also be used to include elements (and ranges of elements):

type
  CharSet = set[char]
var
  x: CharSet
x = {'a'..'z', '0'..'9'} # This constructs a set that contains the
                         # letters from 'a' to 'z' and the digits
                         # from '0' to '9'

These operations are supported by sets:

operationmeaning
A + Bunion of two sets
A * Bintersection of two sets
A - Bdifference of two sets (A without B's elements)
A == Bset equality
A <= Bsubset relation (A is subset of B or equal to B)
A < Bstrong subset relation (A is a real subset of B)
e in Aset membership (A contains element e)
e notin AA does not contain element e
contains(A, e)A contains element e
card(A)the cardinality of A (number of elements in A)
incl(A, elem)same as A = A + {elem}
excl(A, elem)same as A = A - {elem}

Sets are often used to define a type for the flags of a procedure. This is a much cleaner (and type safe) solution than just defining integer constants that should be or'ed together.

Arrays

An array is a simple fixed length container. Each element in the array has the same type. The array's index type can be any ordinal type.

Arrays can be constructed via []:

type
  IntArray = array[0..5, int] # an array that is indexed with 0..5
var
  x: IntArray
x = [1, 2, 3, 4, 5, 6]
for i in low(x)..high(x):
  echo(x[i])

The notation x[i] is used to access the i-th element of x. Array access is always bounds checked (at compile-time or at runtime). These checks can be disabled via pragmas or invoking the compiler with the --bound_checks:off command line switch.

Arrays are value types, like any other Nim type. The assignment operator copies the whole array contents.

The built-in len proc returns the array's length. low(a) returns the lowest valid index for the array a and high(a) the highest valid index.

type
  Direction = enum
    north, east, south, west
  BlinkLights = enum
    off, on, slowBlink, mediumBlink, fastBlink
  LevelSetting = array[north..west, BlinkLights]
var
  level: LevelSetting
level[north] = on
level[south] = slowBlink
level[east] = fastBlink
echo repr(level)  # --> [on, fastBlink, slowBlink, off]
echo low(level)   # --> north
echo len(level)   # --> 4
echo high(level)  # --> west

The syntax for nested arrays (multidimensional) in other languages is a matter of appending more brackets because usually each dimension is restricted to the same index type as the others. In Nim you can have different dimensions with different index types, so the nesting syntax is slightly different. Building on the previous example where a level is defined as an array of enums indexed by yet another enum, we can add the following lines to add a light tower type subdivided in height levels accessed through their integer index:

type
  LightTower = array[1..10, LevelSetting]
var
  tower: LightTower
tower[1][north] = slowBlink
tower[1][east] = mediumBlink
echo len(tower)     # --> 10
echo len(tower[1])  # --> 4
echo repr(tower)    # --> [[slowBlink, mediumBlink, ...more output..
# The following lines don't compile due to type mismatch errors
#tower[north][east] = on
#tower[0][1] = on

Note how the built-in len proc returns only the array's first dimension length. Another way of defining the LightTower to show better its nested nature would be to omit the previous definition of the LevelSetting type and instead write it embedded directly as the type of the first dimension:

type
  LightTower = array[1..10, array[north..west, BlinkLights]]

It is quite frequent to have arrays start at zero, so there's a shortcut syntax to specify a range from zero to the specified index minus one:

type
  IntArray = array[0..5, int] # an array that is indexed with 0..5
  QuickArray = array[6, int]  # an array that is indexed with 0..5
var
  x: IntArray
  y: QuickArray
x = [1, 2, 3, 4, 5, 6]
y = x
for i in low(x)..high(x):
  echo(x[i], y[i])

Sequences

Sequences are similar to arrays but of dynamic length which may change during runtime (like strings). Since sequences are resizable they are always allocated on the heap and garbage collected.

Sequences are always indexed with an int starting at position 0. The len, low and high operations are available for sequences too. The notation x[i] can be used to access the i-th element of x.

Sequences can be constructed by the array constructor [] in conjunction with the array to sequence operator @. Another way to allocate space for a sequence is to call the built-in newSeq procedure.

A sequence may be passed to an openarray parameter.

Example:

var
  x: seq[int] # a sequence of integers
x = @[1, 2, 3, 4, 5, 6] # the @ turns the array into a sequence

Sequence variables are initialized with nil. However, most sequence operations cannot deal with nil (leading to an exception being raised) for performance reasons. Thus one should use empty sequences @[] rather than nil as the empty value. But @[] creates a sequence object on the heap, so there is a trade-off to be made here.

The for statement can be used with one or two variables when used with a sequence. When you use the one variable form, the variable will hold the value provided by the sequence. The for statement is looping over the results from the items() iterator from the system module. But if you use the two variable form, the first variable will hold the index position and the second variable will hold the value. Here the for statement is looping over the results from the pairs() iterator from the system module. Examples:

for i in @[3, 4, 5]:
  echo($i)
# --> 3
# --> 4
# --> 5

for i, value in @[3, 4, 5]:
  echo("index: ", $i, ", value:", $value)
# --> index: 0, value:3
# --> index: 1, value:4
# --> index: 2, value:5

Open arrays

Note: Openarrays can only be used for parameters.

Often fixed size arrays turn out to be too inflexible; procedures should be able to deal with arrays of different sizes. The openarray type allows this. Openarrays are always indexed with an int starting at position 0. The len, low and high operations are available for open arrays too. Any array with a compatible base type can be passed to an openarray parameter, the index type does not matter.

The openarray type cannot be nested: multidimensional openarrays are not supported because this is seldom needed and cannot be done efficiently.

Varargs

A varargs parameter is like an openarray parameter. However, it is also a means to implement passing a variable number of arguments to a procedure. The compiler converts the list of arguments to an array automatically:

proc myWriteln(f: File, a: varargs[string]) =
  for s in items(a):
    write(f, s)
  write(f, "\n")

myWriteln(stdout, "abc", "def", "xyz")
# is transformed by the compiler to:
myWriteln(stdout, ["abc", "def", "xyz"])

This transformation is only done if the varargs parameter is the last parameter in the procedure header. It is also possible to perform type conversions in this context:

proc myWriteln(f: File, a: varargs[string, `$`]) =
  for s in items(a):
    write(f, s)
  write(f, "\n")

myWriteln(stdout, 123, "abc", 4.0)
# is transformed by the compiler to:
myWriteln(stdout, [$123, $"def", $4.0])

In this example $ is applied to any argument that is passed to the parameter a. Note that $ applied to strings is a nop.

Slices

Slices look similar to subranges types in syntax but are used in a different context. A slice is just an object of type Slice which contains two bounds, a and b. By itself a slice is not very useful, but other collection types define operators which accept Slice objects to define ranges.

var
  a = "Nim is a progamming language"
  b = "Slices are useless."

echo a[7..12] # --> 'a prog'
b[11.. -2] = "useful"
echo b # --> 'Slices are useful.'

In the previous example slices are used to modify a part of a string, and even a negative index is used. The slice's bounds can hold any value supported by their type, but it is the proc using the slice object which defines what values are accepted.

Tuples

A tuple type defines various named fields and an order of the fields. The constructor () can be used to construct tuples. The order of the fields in the constructor must match the order in the tuple's definition. Different tuple-types are equivalent if they specify fields of the same type and of the same name in the same order.

The assignment operator for tuples copies each component. The notation t.field is used to access a tuple's field. Another notation is t[i] to access the i'th field. Here i needs to be a constant integer.

type
  Person = tuple[name: string, age: int] # type representing a person:
                                         # a person consists of a name
                                         # and an age
var
  person: Person
person = (name: "Peter", age: 30)
# the same, but less readable:
person = ("Peter", 30)

echo(person.name) # "Peter"
echo(person.age)  # 30

echo(person[0]) # "Peter"
echo(person[1]) # 30

# You don't need to declare tuples in a separate type section.
var building: tuple[street: string, number: int]
building = ("Rue del Percebe", 13)
echo(building.street)

# The following line does not compile, they are different tuples!
#person = building
# --> Error: type mismatch: got (tuple[street: string, number: int])
#     but expected 'Person'

# The following works because the field names and types are the same.
var teacher: tuple[name: string, age: int] = ("Mark", 42)
person = teacher

Even though you don't need to declare a type for a tuple to use it, tuples created with different field names will be considered different objects despite having the same field types.

Tuples can be unpacked during variable assignment (and only then!). This can be handy to assign directly the fields of the tuples to individually named variables. An example of this is the splitFile proc from the os module which returns the directory, name and extension of a path at the same time. For tuple unpacking to work you have to use parenthesis around the values you want to assign the unpacking to, otherwise you will be assigning the same value to all the individual variables! Example:

import os

let
  path = "usr/local/nimc.html"
  (dir, name, ext) = splitFile(path)
  baddir, badname, badext = splitFile(path)
echo dir      # outputs `usr/local`
echo name     # outputs `nimc`
echo ext      # outputs `.html`
# All the following output the same line:
# `(dir: usr/local, name: nimc, ext: .html)`
echo baddir
echo badname
echo badext

Tuple unpacking only works in var or let blocks. The following code won't compile:

import os

var
  path = "usr/local/nimc.html"
  dir, name, ext = ""

(dir, name, ext) = splitFile(path)
# --> Error: '(dir, name, ext)' cannot be assigned to

Reference and pointer types

References (similar to pointers in other programming languages) are a way to introduce many-to-one relationships. This means different references can point to and modify the same location in memory.

Nim distinguishes between traced and untraced references. Untraced references are also called pointers. Traced references point to objects of a garbage collected heap, untraced references point to manually allocated objects or to objects somewhere else in memory. Thus untraced references are unsafe. However for certain low-level operations (accessing the hardware) untraced references are unavoidable.

Traced references are declared with the ref keyword, untraced references are declared with the ptr keyword.

The empty [] subscript notation can be used to derefer a reference, meaning to retrieve the item the reference points to. The . (access a tuple/object field operator) and [] (array/string/sequence index operator) operators perform implicit dereferencing operations for reference types:

type
  Node = ref NodeObj
  NodeObj = object
    le, ri: PNode
    data: int
var
  n: Node
new(n)
n.data = 9
# no need to write n[].data; in fact n[].data is highly discouraged!

To allocate a new traced object, the built-in procedure new has to be used. To deal with untraced memory, the procedures alloc, dealloc and realloc can be used. The documentation of the system module contains further information.

If a reference points to nothing, it has the value nil.

Procedural type

A procedural type is a (somewhat abstract) pointer to a procedure. nil is an allowed value for a variable of a procedural type. Nim uses procedural types to achieve functional programming techniques.

Example:

proc echoItem(x: int) = echo(x)

proc forEach(action: proc (x: int)) =
  const
    data = [2, 3, 5, 7, 11]
  for d in items(data):
    action(d)

forEach(echoItem)

A subtle issue with procedural types is that the calling convention of the procedure influences the type compatibility: procedural types are only compatible if they have the same calling convention. The different calling conventions are listed in the manual.

Modules

Nim supports splitting a program into pieces with a module concept. Each module is in its own file. Modules enable information hiding and separate compilation. A module may gain access to symbols of another module by the import statement. Only top-level symbols that are marked with an asterisk (*) are exported:

# Module A
var
  x*, y: int

proc `*` *(a, b: seq[int]): seq[int] =
  # allocate a new sequence:
  newSeq(result, len(a))
  # multiply two int sequences:
  for i in 0..len(a)-1: result[i] = a[i] * b[i]

when isMainModule:
  # test the new ``*`` operator for sequences:
  assert(@[1, 2, 3] * @[1, 2, 3] == @[1, 4, 9])

The above module exports x and *, but not y.

The top-level statements of a module are executed at the start of the program. This can be used to initialize complex data structures for example.

Each module has a special magic constant isMainModule that is true if the module is compiled as the main file. This is very useful to embed tests within the module as shown by the above example.

Modules that depend on each other are possible, but strongly discouraged, because then one module cannot be reused without the other.

The algorithm for compiling modules is:

  • Compile the whole module as usual, following import statements recursively.
  • If there is a cycle only import the already parsed symbols (that are exported); if an unknown identifier occurs then abort.

This is best illustrated by an example:

# Module A
type
  T1* = int  # Module A exports the type ``T1``
import B     # the compiler starts parsing B

proc main() =
  var i = p(3) # works because B has been parsed completely here

main()
# Module B
import A  # A is not parsed here! Only the already known symbols
          # of A are imported.

proc p*(x: A.T1): A.T1 =
  # this works because the compiler has already
  # added T1 to A's interface symbol table
  result = x + 1

A symbol of a module can be qualified with the module.symbol syntax. If the symbol is ambiguous, it even has to be qualified. A symbol is ambiguous if it is defined in two (or more) different modules and both modules are imported by a third one:

# Module A
var x*: string
# Module B
var x*: int
# Module C
import A, B
write(stdout, x) # error: x is ambiguous
write(stdout, A.x) # no error: qualifier used

var x = 4
write(stdout, x) # not ambiguous: uses the module C's x

But this rule does not apply to procedures or iterators. Here the overloading rules apply:

# Module A
proc x*(a: int): string = $a
# Module B
proc x*(a: string): string = $a
# Module C
import A, B
write(stdout, x(3))   # no error: A.x is called
write(stdout, x(""))  # no error: B.x is called

proc x*(a: int): string = nil
write(stdout, x(3))   # ambiguous: which `x` is to call?

Excluding symbols

The normal import statement will bring in all exported symbols. These can be limited by naming symbols which should be excluded with the except qualifier.

import mymodule except y

From statement

We have already seen the simple import statement that just imports all exported symbols. An alternative that only imports listed symbols is the from import statement:

from mymodule import x, y, z

The from statement can also force namespace qualification on symbols, thereby making symbols available, but needing to be qualified to be used.

from mymodule import x, y, z

x()           # use x without any qualification
from mymodule import nil

mymodule.x()  # must qualify x with the module name as prefix

x()           # using x here without qualification is a compile error

Since module names are generally long to be descriptive, you can also define a shorter alias to use when qualifying symbols.

from mymodule as m import nil

m.x()         # m is aliasing mymodule

Include statement

The include statement does something fundamentally different than importing a module: it merely includes the contents of a file. The include statement is useful to split up a large module into several files:

include fileA, fileB, fileC

Part 2

So, now that we are done with the basics, let's see what Nim offers apart from a nice syntax for procedural programming: Part II