Lisp nested expressions are AST. A CLOS-sified representation of the program with additional semantic info is more of an IR: intermediate representation.
Note that you also aren't using "token" in the usual way it appears in computer science.
ANSI Lisp's Glossary gets it right:
> token n. a textual representation for a number or a symbol. See Section 2.3 (Interpretation of Tokens).
"1234" and "A1234" are tokens: chunks of accumulated token-constituent characters inside the reader. These are turned into values, which are no longer tokens. We never have a "tree of tokens"; for that to be true you have to abuse the word "token" to denote "Lisp value".
The Lisp AST consists of nodes which are typed values. (Some of those values are aggregates that are not lists: let's not forget vectors and structure literals.)
When non-Lisp-programmers write parsers for non-Lisp-languages, they crank out AST's which imitate the concept. A typical approach is for the parser rules to produce nodes based on a discriminated union of types: in its essence, an implementation of dynamically typed objects.
What can we look at for a real example? How about GNU Awk. This has a Yacc grammar:
Okay, so that would be our GNU AWK AST node. What is it? Why, just a grotesque Greenspunning of some dynamic typing. The two unions in there are like car and cdr. The semantics actions of the Yacc parser do a whole lot of list_create and list_append involving this INSTRUCTION type, often behind various wrapper functions like mk_assignment or mk_expression_list and whatnot,
The main difference is that the Lisp way of representing AST's is just cleaner and smarter, like the Lisp way of doing pretty much anything.
> Note that you also aren't using "token" in the usual way it appears in computer science
I use the word such that we can talk about parsing, syntax, and syntax trees at all. Lisp works different from most languages (the Lisp syntax is working over a data representation, syntax can depend on runtime state in the case of macros and a Lisp interpreter, ...), so I try to use related words to map that a bit.
We can look at the textual representation:
(let ((x '(* x x)))
(progv '(x) '(2)
(eval x)))
If we look at that code as text and/or as an S-expression, we have no idea what each X is. We have to construct that information by parsing or even by executing it. The s-expression is no AST (abstract SYNTAX tree), because it has no syntactic information represented, besides some structural information and on a textual level we can construct data objects from the tokens. Each token has a direct data representation (unless we have the reader re-programmed).
An AST OTOH has already been constructed as a tree, based on a syntax grammar for a programming language. To reach to an AST we need a parser + language syntax + a tree representation mechanism. The Lisp s-expression representation lacks the parser and the language syntax. All the s-expression represents is the data - internal and external. For an s-expression there is no difference between: (* x y) and (x y ). READ accepts externally represented data - it has no idea if the thing it successfully reads is a valid Lisp program or not. Only the parsing/interpretation of a that code snippet on the Lisp language level makes it possible to see if is a function or a variable. For that we need a grammar which says if that code is prefix, infix, postfix or something else. For Lisp it is much worse, since the interpretation of (foo (* x y)) depends on what foo actually does at runtime. In a Lisp interpreter, FOO could be a macro, which interprets the enclosed expression differently each time it gets executed.
If we look at an AST, we need to combine the s-expression + a syntax + plus something which acts similar to a parser to create an AST. Without that syntax, we can not generate that AST from the s-expression. IF we take a slightly different syntax, we are able to create different ASTs from an s-expression.
Note that the meaning of words like token, parser, lexer, scanner, ast, syntax has to be adjusted for Lisp. For example the ANSI CL standard contains EBNF-based syntax descriptions. But this syntax is actually over s-expressions (data - since we also need to be able what is a valid program as data) and not for textual representations of programs.
A token has two usual meanings: on the textual side it is the characters which form a basic element in the language, for example a number. In a representation, a scanner/lexer generates, it is then the data object which represents that thing. Think (defclass token () (text type ...)) and (defclass number-token (token) (numeric-value)). For Lisp the textual tokens are mapped to a more direct representation - each token gets translated into a typed data object (which has its type encode and which has a procedure to generate a textual representation back).