Searching the PowerShell Abstract Syntax Tree

PowerShell's Abstract Syntax Tree, or AST for short, contains a full listing of all parsed content in PowerShell code. This means that it contains just about everything you need to be able to figure out precisely what is going on in someone's code — all without ever having to delve into regex or other text parsing messiness. About the only thing it doesn't contain are code comments, but in this instance that's not what we're here for anyway.

If you need comments, I'm afraid that you'll have to take to searching and filtering by tokens, and that means manually invoking the PowerShell parser — but that will have to be a story for another day!

Accessing the AST

Before you can search the AST, you will first need to access the root AST node. You can find these on any scriptblock object, which gives us a few options when it comes to retrieving them.

  1. For functions in a module or loaded in the current session, we can use (Get-Command MyFunction).ScriptBlock.Ast
  2. For scriptblocks themselves, we can just pick it right off of them with $ScriptBlock.Ast
    • This also works on scriptblock literals — {Test-Path 'C:\'}.Ast
  3. For script files themselves you can call (Get-Command .\Path\Script.ps1).ScriptBlock.Ast similar to how you would a function.

For the purposes of anyone following along at home, in this post I'll actually be working with a script function from my PSKoans module, namely the Measure-Karma function. To get the AST from this command, enter the following line of code:

$CommandAST = (Get-Command -Name Measure-Karma -Module PSKoans).ScriptBlock.Ast

You'll actually notice that outputting either the ScriptBlock or the AST object itself looks essentially identical in the console. This is primarily because an AST object will typically just output its Extent property when asked to convert to string, which should be generally the same as a scriptblock's string representation. However, under the hood, the AST has a ton of extremely important and helpful methods available.

AST Methods and Properties

If you check the Get-Member output from this AST object, you'll see the following list of properties and methods. I've reformatted it as a proper table to make it easier to read here, with members I think are particularly useful marked in bold.

Name Type Definition
Copy Method System.Management.Automation.Language.Ast Copy()
Equals Method bool Equals(System.Object obj)
Find Method System.Management.Automation.Language.Ast Find(System.Func[System.Management.Automation.Language.Ast,bool] predicate, bool searchNestedScriptBlocks)
FindAll Method System.Collections.Generic.IEnumerable[System.Management.Automation.Language.Ast] FindAll(System.Func[System.Management.Automation.Language.Ast,bool] predicate, bool searchNestedScriptBlocks)
GetHashCode Method int GetHashCode()
GetHelpContent Method System.Management.Automation.Language.CommentHelpInfo GetHelpContent(), System.Management.Automation.Language.CommentHelpInfo GetHelpContent(System.Collections.Generic.Dictionary[System.Management.Automation.Language.Ast,System.Management.Automation.Language.Token[]] scriptBlockTokenCache)
GetType Method type GetType()
SafeGetValue Method System.Object SafeGetValue()
ToString Method string ToString()
Visit Method System.Object Visit(System.Management.Automation.Language.ICustomAstVisitor astVisitor), void Visit(System.Management.Automation.Language.AstVisitor astVisitor)
Body Property System.Management.Automation.Language.ScriptBlockAst Body {get;}
Extent Property System.Management.Automation.Language.IScriptExtent Extent {get;}
IsFilter Property bool IsFilter {get;}
IsWorkflow Property bool IsWorkflow {get;}
Name Property string Name {get;}
Parameters Property System.Collections.ObjectModel.ReadOnlyCollection[System.Management.Automation.Language.ParameterAst] Parameters {get;}
Parent Property System.Management.Automation.Language.Ast Parent {get;}

Let's take a closer look at the ones I've bolded.

FindAll

This is what I would consider the essential method for actually finding what you want in most cases where a pre-existing function or property won't just hand you what you're looking for.

It takes two parameters: a System.Func predicate parameter with two generic parameters itself, and a bool parameter that determines if it will recursively search nested script blocks, or just the "immediate level" of the AST.

As I mentioned in my Anonymous Functions post, however, we can actually just use a rather more familiar PowerShell scriptblock instead of the Func[Ast,bool] predicate. The important thing is that we take note of the types specified in the Func parameter — looking at the Func[T, TResult] documentation we can see that the first type parameter is the input type and the second is the output type; we need a script block that accepts an Ast object and outputs a bool object.

What does this mean? Well, effectively, it means we actually need a param() block in our predicate scriptblock that accepts an AST object. The actual name(s) of the parameters you use in scriptblocks that are used for predicates doesn't matter, but the amount of them and the type of them will!

You can search for AST objects based on any condition you fancy, but for a brief example I'll show you how to use it to find all loop statements in a given AST.

# All AST types are kept in here; this will save a LOT of writing!
using namespace System.Management.Automation.Language

$Predicate = {
    param( [Ast] $AstObject )

    return ( $AstObject -is [LoopStatementAst] )
}

$Ast.FindAll($Predicate, $true)

Note that this scriptblock accepts exactly one parameter — specifying the type is optional, but will help immensely with giving you options for type inference to autocomplete property names as you explore. It also only outputs one object; due to the type signature of the method we're calling and the parameters it accepts, your output will be coerced to the expected bool type regardless of whatever you output.

N.B.: Find() is essentially identical, but will only return the first result, rather than a list of all results found.

Due to the fact that any array of length two or greater resolves to $true when cast to bool — yes, even @($false, $false) — you will want to ensure you don't accidentally output multiple objects. As such, despite the usual warnings against the return statement remaining very much valid, I would recommend including one, and ensuring that that really is the only thing you output here. Just a bit of a visual reminder: "Hey, we're only supposed to return one thing here!"

Naturally, you can also enter the script block as a literal, but be mindful of the often confusing morass of code you can end up buried in. If you do want to do this, I'd recommend writing the expression like this:

using namespace System.Management.Automation.Language

$Ast.FindAll(
    {
        param( [Ast] $AstObject )

        return ( $AstObject -is [LoopStatementAst] )
    },
    $true
)

This isn't a very typical PowerShell style, but I have found that having too many braces or parentheses close to each other just leads quickly to very hard to read code. The additional spacing here is a very welcome addition if you choose to write your script blocks in the actual method calls.

The output from FindAll() is a rather straightforward collection that you can search or filter with more common PowerShell methods, like Where-Object or the .Where{} method.

GetHelpContent

Full disclosure: I had no idea this existed until I started writing this post.

I'm not entirely sure how it works, but it seems to be the method that Get-Help calls in order to look for comment-based-help data in any given function or script. Good to know!

SafeGetValue

This is a very interesting little method that can be executed on any AST object in order to attempt to get a value back from it. Mind you, only "safe" values are permitted; anything else will throw an error.

A "safe" value is one that can be expressed as a literal value and doesn't depend on anything that might give changed values. As far as I know, it only accepts items that can be expressed as PowerShell literals, e.g., numbers, strings, hashtable literals, arrays, and so forth. If you attempt to use SafeGetValue on something that may have a dynamic value (anything containing scriptblocks, commands, .NET object instantiations, etc.) you will get the following error:

Exception calling "SafeGetValue" with "0" argument(s): "Cannot generate a PowerShell object for a ScriptBlock evaluating dynamic expressions. Dynamic expression: {"A"}."
At line:1 char:1
+ {"A"}.ast.Find({$args[0] -is [System.Management.Automation.Language.S ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : InvalidOperationException

However, if the value you want to retrieve is relatively simple, this is a guaranteed safe way to get a value that quickly just errors out if anyone tries to replace it with something that might possibly be dangerous to look at / execute directly.

Parent

Each AST node actually contains a link to its parent node, which helps immensely when you need to find something weirdly specific, for example which function in a file has a specific mandatory parameter with position 1. It can really simplify your code and minimise the number of furious .Where{} filter levels you need.

Rummaging Through the AST

Now that we've covered the basics of the AST, let's run through a quick example here, using Measure-Karma as our guinea pig. Follow along in your own shell, if you like, just remember to add the using statement I mentioned before, unless you love typing System.Management.Automation.Language over and over again!

$Ast = Get-Command -Module PSKoans -Name Measure-Karma |
    ForEach-Object { $_.ScriptBlock.Ast }

Finding Hashtables

Yep, we can just up and find whatever hashtables happen to be in this function. Weird? Maybe, but let's take a look!

$Hashtables = $Ast.FindAll(
    {
        param($Item)
        return ($Item -is [HashtableAst])
    }
)

Alright, let's see what these look like!

PS> $Hashtables.Extent.Text

@{
    Script   = $KoanFile
    PassThru = $true
    Show     = 'None'
}
@{
    DescribeName = $NextKoanFailed.Describe
    Expectation  = $NextKoanFailed.ErrorRecord
    ItName       = $NextKoanFailed.Name
    Meditation   = $NextKoanFailed.StackTrace
    KoansPassed  = $KoansPassed
    TotalKoans   = $TotalKoans
}
@{
    Complete    = $true
    KoansPassed = $KoansPassed
    TotalKoans  = $PesterTestCount
}

You'll note that in your own console, these show up with strangely mismatching indents. The AST for hashtables takes into account indentation in its Extent, but the AST only sees from the opening symbols to the closing symbol, so no indents prior to the opening symbol will be stored in this AST object.

What else can we get at here, programmatically?

PS> $Hashtables.KeyValuePairs

Item1        Item2                       Length
-----        -----                       ------
Script       $KoanFile                        2
PassThru     $true                            2
Show         'None'                           2
DescribeName $NextKoanFailed.Describe         2
Expectation  $NextKoanFailed.ErrorRecord      2
ItName       $NextKoanFailed.Name             2
Meditation   $NextKoanFailed.StackTrace       2
KoansPassed  $KoansPassed                     2
TotalKoans   $TotalKoans                      2
Complete     $true                            2
KoansPassed  $KoansPassed                     2
TotalKoans   $PesterTestCount                 2

Interesting… Feel free to explore these as we go; I could spend weeks in some of these, so I'll just touch on each briefly as we poke about.

Finding Param() Declarations

We could probably just look for [ParamBlockAst], I think, and there's probably a list of actual parameters in it, judging by how helpful HashtableAst was in listing every KeyValuePair. Let's check it out!

$ParamBlock = $Ast.FindAll(
    {
        param($Item)
        return ($Item -is [ParamBlockAst])
    },
    $true
)

Alright, let's see what we have in here…

PS> $ParamBlock

Attributes             Parameters             Extent
----------             ----------             ------
{CmdletBinding, Alias} {$Contemplate, $Reset} param(...

Fantastic! We have everything we need. Before we delve into the parameters themselves, we can already see that there are a couple of Attributes on this ParamBlockAst — they look rather familiar!

PS> $ParamBlock.Attributes.Extent.Text
[CmdletBinding(SupportsShouldProcess, DefaultParameterSetName = "Default")]
[Alias('Invoke-PSKoans', 'Test-Koans', 'Get-Enlightenment', 'Meditate','Clear-Path')]

Now, we could do some fancy regex to go ahead and figure out which functions in a huge file actually have ShouldProcess support, if we wanted. Or, we could be a little more clever, and actually be 100% sure, with no regex required.

$ParamBlocks = $Ast.FindAll(
    {
        param($Item)
        return ($Item -is [ParamBlockAst])
    },
    $true
)

$ParamBlocks |
    Where-Object {
        $_.Attributes |
            Where-Object {
                $_.Parent.Parent -is [FunctionDefinitionAst] -and
                $_.TypeName.Name -eq 'CmdletBinding'
            } |
            ForEach-Object NamedArguments |
            Where-Object {
                $_.ArgumentName -eq 'SupportsShouldProcess'
            } |
            ForEach-Object { $_.Argument.Value }
    } |
    ForEach-Object { $_.Parent.Parent.Name }

We need to go two levels up here, because the first Parent of a ParamBlockAst is just the ScriptBlockAst; the FunctionDefinitionAst will be its parent, if this is even a function.

Okay, that may have been a little intense, and for many things you can often check the Get-Command output instead. However, sometimes you might need to check specific attribute values or how they were declared; for example, checking $_.Argument.Value resolves even arguments like SupportsShouldProcess to their inferred value.

If you were, for example, creating a compatibility-checking script, you might need to traverse the AST of a given file or function and determine whether or not that attribute argument has had the = $true omitted and flag that. Turns out you can do just that by checking the ExpressionOmitted property on each of the arguments listed in the NamedArguments property.

Finding Mandatory Parameters

Now that we have the Param() block easily accessible, we can either use its .Parameters property to look at the parameters it contains, or we can simply use FindAll() once more to find each [ParameterAst] object. In both cases we'll get back similar output. Let's see if we can figure out which parameters are Mandatory!

$Parameters = $ParamBlocks.Parameter | Where-Object {
    $_.Attributes |
        Where-Object {
            $_.TypeName.Name -eq 'Parameter'
        } |
        ForEach-Object -MemberName NamedArguments |
        Where-Object {
            $_.ArgumentName -eq 'Mandatory'
        }
        ForEach-Object {$_.Argument.Value} # Will either be $true or $false
}

Let's see what that gets us!

PS> $Parameters.Name


VariablePath : Contemplate
Splatted     : False
StaticType   : System.Object
Extent       : $Contemplate
Parent       : [Parameter(Mandatory, ParameterSetName = "OpenFolder")]
                       [Alias('Koans', 'Meditate')]
                       [switch]
                       $Contemplate

VariablePath : Reset
Splatted     : False
StaticType   : System.Object
Extent       : $Reset
Parent       : [Parameter(Mandatory, ParameterSetName = "Reset")]
                       [switch]
                       $Reset

The AST is a bit of a wild ride no matter how you're looking at it, but knowing how to search through it to find exactly what you need really helps deal with it. It's one of the core PowerShell language features that helps PowerShell itself actually handle the script code, so don't be too surprised if you sometimes get more than you bargained for; it's all there for a very good reason.

Thanks for reading!