After building an abstract syntax tree, the compiler checks the program for language compliance with respect to context-sensitive aspects of the grammar. For statically typed languages, a large part of this step is type-checking. Python is dynamically typed, so I cannot bind variables to types. It is, however, statically scoped, so it is possible to bind identifiers to variables. This will catch "NameError: id is undefined" errors, along with a few others.
Every file, class, and function has its own namespace. The file-level names are (generally) available as global symbols within its classes and functions. When an inner function is nested within an outer function, the inner function may reference free variables assigned in the outer function or local variables, which include parameters. All parameters are considered to be not assigned, but are not free. Any other local variable will be assigned - if it is not assigned anywhere locally or in an enclosing scope, then it is determined to be global. Even if there is no such global variable defined, the file should still compile - it will cause a runtime exception.
Consider the following example:
Within the scope of inner, all five variables are referenced. innerVar is the only one which is assigned; param is merely a parameter. Both innerVar and param are local, outerVar is free, and both globalVar and undef are global.
In a single traversal, it is straightforward to create a collection of referenced and assigned identifiers for each scope. Until looking at the entire block, however, it is impossible to determine whether an identifier occurring on the rhs is going to be local, free or global since in the following example var is local in the entire function scope, and so it is illegal to use it.
In order to assign to a global variable within a function, you must use the global keyword.
I don't yet understand why it is important to distinguish between free and global variables, so I have not yet implemented this. It will require a second visitor to traverse the tree.
I handle all other checking with a single visitor Checker. This visitor creates a tree of symbol tables (I have been using symbol table and namespace interchangeably). A symbol table contains a dictionary mapping locally referenced identifiers to a SymbolData object, which contains fields to record if the identifier is assigned or declared as global in the scope. It also contains a collection of symbol tables that are defined within its scope. These are not tied with identifiers (except internally) because such binds may dynamically change at any time. If a parameter is not referenced, then it will not be in the primary dictionary, so the symbol table also keeps a collection of its parameters (since they may be referenced in a nested scope). Finally, the symbol table contains identifying information, including its name (such as 'foo' or 'inner'). Here I can catch errors of assigning to a variable after referencing, which is illegal.
My symbol table has methods to add identifiers as assigned, referenced, parameters, and global. If it tries to add an identifier as assigned, but the identifier has already been referenced, then it registers an error with the ErrorReporter (which is passed as an additional parameter; an alternative would be to throw an exception which the Checker would catch and register with the error reporter).
The behavior of a symbol table depends on whether it is a module/file, function, or class. For example, scopes nested inside of a class do not gain access to the class's local variables. To account for this, I could create three SymbolTable subclasses, which would permit me to create a SymbolTableVisitor interface to conveniently traverse it if I need to. I don't think I will need to, though, so I currently just have a field that stores the type.
As the Checker visits nodes, it creates a new symbol table whenever it enters a new scope (at the root, function def, and class def). It initializes the type and name for all of these. A function table needs the variables in its parameter list to be added in a special way. For the function def foo(a,(b,(c,d)),e=4,(f,g)=(global1,global2)), I would need to add a,b,c,d,e,f, and g as parameters to the foo symbol table, and global1 and global2 as referenced identifiers within the enclosing scope. Since I store parameters within the function node as a list of expressions (including identifiers, tuples, and default parameters), I cannot simply visit these in the same manner as typical identifiers and tuples. One solution would be to alter my parser to wrap all parameters in a ParameterNode that has an optional default value. It would still be difficult to visit each expression within a tuple as a parameter, though. Alternatively, I could create a new ParamChecker object to visit everything in the parameter list. Instead I created a few local methods within Checker that include conditionals on the type of the node. This is still not particularly elegant, but keeps everything within the Checker class. I use the same approach to handle the left-hand side of an assignment, where single identifiers should be added to the symbol table as assigned, but anything else is treated as an R-value.
I'm not quite sure why I need to differentiate free variables from global ones. Maybe I can link free variables to their defining scope?