Dowst.Dev

Automation Authoring

When I started writing my book, Practical Automation with PowerShell, I discovered how much time and energy is required to keep everything up to date. For example, if I changed a piece of code in the text, I had to make sure the code sent to the publisher and uploaded to GitHub matched. Not to mention the style guidelines I needed to follow.

Or, if I included a screenshot or diagram, I had to upload the original to another folder named to match the chapter and figure number in the book. As you can imagine adding, changing, or removing anything during the editing process would create a laundry list of other things I would need to check. And this is where PowerShell came to the recuse.

I used PowerShell to help me keep track of code, images, my table of contents, and numerous other portions of the book writing process. So, I thought I would share with you some of the code and techniques I used during the authoring process.

While you might not be writing a book, you may certainly find some of these useful in your day-to-day scripting needs.

Each post in this series will be linked to and detailed below. Be sure to check back often for updates.

Working with Images

Comparing Images – The first post in this series is a function that allows you to compare two images. During the writing process, I had to ensure that the pictures in my Word documents matched the files I had uploaded to the publisher. This function allowed me to compare the two and ensure everything matched.
Extracting images from Word – This post will show you have to use PowerShell extraction images from a Word document, copy them to a new location, and list the caption information for each image.
*Converting Visio to PNG and SVG – During the writing process I had to create many Visio diagrams. Each diagram I created had to be embedded in the Word document as a PNG and I had to supply the graphics team with an SVG version. So, I wrote this function to export all Visio diagrams in a folder to PNG and SVG.

_{*newest post in the series}

Up Next – Comparing Word documents – Ever end up with two copies of the same Word document? Well this post will show you how to use PowerShell to quickly find the differences between them.

Converting Visio to PNG and SVG

When working in Visio, it is not uncommon that you need to export your diagram to a picture for sharing or placing in documentation. For example, when writing, I will often have multiple Visio diagrams that I continually tweak throughout the process. So, I wrote a function to take all the Visio diagrams in a folder and export them to SVG and PNG.

All you need to do is pass this function the path of the folder, and it will do the rest.


Function Export-VisioToImages {
    <#
.SYNOPSIS
Use to export Visio diagrams to PNG and SVG format

.PARAMETER Path
Specifies the path to the folder with the Visio diagrams

.PARAMETER Filter
Specifies a filter to qualify the Path parameter. Default value is "*.vsdx"

.PARAMETER Force
Forces the command to overwrite existing export files.

.EXAMPLE
Export-VisioToImages -Path $Path -Force

.NOTES
Requires that Visio is installed on the local machine
#>
    param(
        [string]$Path,
        [string]$Filter = "*.vsdx",
        [switch]$Force
    )

    # Create the Visio object
    $Visio = New-Object -comobject Visio.Application
    $Visio.Visible = $false

    # Get all the Visio files in the folder
    $FilesToExport = Get-ChildItem $Path -Filter $Filter

    foreach ($item in $FilesToExport) {
        # Open the Visio document
        $doc = $Visio.Documents.Open($item.FullName)

        # Set the paths for the svg and png files
        $ExportPaths = @(
            Join-Path $item.DirectoryName "$($item.BaseName).svg"
            Join-Path $item.DirectoryName "$($item.BaseName).png"
        )

        foreach ($export in $ExportPaths) {
            if (Test-Path $export) {
                # If file exists and force is true, the delete the existing file
                if ($Force) {
                    Remove-Item $export -Force
                }
                # else if the file exists and force is not true, go to the next file
                else {
                    Write-Warning "Skipping '$export' because it already exists. Use -force to overwrite it."
                    continue
                }
            }
            # Export the Visio document
            $doc.Pages | ForEach-Object {
                $_.Export($export)
            }
            Write-Output $export
        }
        # Close the document
        $doc.Close()
    }

    # Close Visio
    $Visio.Quit()
}

The this post of part of the series Automation Authoring. Refer the main article for more details on use cases and additional content in the series.

Extracting images from Word

The process of extracting images with a Word document is relatively straightforward. All you have to do is rename the document from a .docx to a .zip and extract it. Once you do that, all the images will be in a subfolder named media.

However, with the help of PowerShell, we can not only automate the extraction but also copy them to a new location and list the caption information for each image.

The first thing you need to do is rename the Word document with the .zip extension. To ensure the original Word document remains untouched, we’ll copy it to a temporary folder and rename it.

Once you have the zip file, you can run a simple Expand-Archive command to extract the contents of the Word document. You will find the images in the subfolder word\media.

Then you can have PowerShell copy the files to another directory. And if that is all you wanted to do, you are done.

However, we can take things a step further and parse the Word document to display the captions for each image.

To do this, you will need to load the document.xml file into a PowerShell object. This XML contains all the configuration and references for the Word document. You can then parse through each paragraph to find the ones that are images and the ones that are captions. Images will have a drawing section under the paragraph, and captions with have a fldSimple property.

A child node named keepNext lets you determine if a caption is above or below the picture. When the caption is below, the image will have the keepNext node, but when the caption is above, the caption paragraph will have the keepNext node. If there is no caption, neither will have the node.

You can see this in the output below. Figures 1 and 3 have the captions below. Figure 2 has the caption above, and figure 4 does not have a caption.

Now all you need to do is parse through each image, match it with its appropriate caption, and output the results.

You can find the full code below. Also, since it parses the XML and not Word itself, this function does not require Word to be installed.

Function Export-ImagesFromWord {
    <#
.SYNOPSIS
Extracts images from a Word document and copies them to a new location

.DESCRIPTION
Extracts images from a Word document and copies them to a new location. 
After the extraction the caption informatino will be outputed to the screen

.PARAMETER DocumentPath
The path of the Word Document

.PARAMETER Destination
The folder to copy the file into

.EXAMPLE
Export-ImagesFromWord -DocumentPath "D:\scripts\ImageExamples.docx" -Destination "D:\scripts\images"

.NOTES
Does not require Word to be installed
#>
    [CmdletBinding()]
    [OutputType()]
    param(
        [Parameter(Mandatory = $true)]
        [string]$DocumentPath,
        [Parameter(Mandatory = $true)]
        [string]$Destination
    )

    # Create a temporary folder to hold the extracted files
    $BaseName = [System.IO.Path]::GetFileNameWithoutExtension($documentPath) 
    $extractPath = Join-Path $env:Temp "mediaExtract\$($BaseName)"
    If (Test-Path $extractPath) {
        Remove-Item -Path $extractPath -Force -Recurse | Out-Null
    }
    New-Item -type directory -Path $extractPath | Out-Null

    # Copy the Word document as a zip and expand it
    $zipPath = Join-Path $extractPath "$($BaseName).zip"
    $zip = Copy-Item $documentPath $zipPath -Force -PassThru
    Expand-Archive -Path $zip.FullName -DestinationPath $extractPath -Force

    # Get the media files extracted and copy them to the output folder
    $mediaPath = Join-Path $extractPath 'word\media'
    If (-not(Test-Path $Destination)) {
        New-Item -type directory -Path $Destination | Out-Null
    }
    $extractedfigures = Get-ChildItem $mediaPath -File | Copy-Item -Destination $Destination -PassThru | Select-Object Name, @{l = 'Figure'; e = { $null } }, 
        @{l = 'Caption'; e = { '' } }, @{l = 'Id'; e = { [int]$($_.BaseName.Replace('image', '')) } }, FullName

    # Get the document configuration
    $documentXmlPath = Join-Path $extractPath 'word\document.xml'
    [xml]$docXml = Get-Content $documentXmlPath -Raw

    # Get all the paragraphs to find the images and captions
    $paragraphs = $docXml.document.body.p | Select-Object @{l = 'keepNext'; e = { @($_.pPr.ChildNodes.LocalName).Contains('keepNext') } }, 
        @{l = 'Id'; e = { $_.r.drawing.inline.docPr.id } }, @{l = 'CaptionId'; e = { $_.fldSimple.r.t } }, @{l = 'Prefix'; e = { $_.r[0].t.'#text' } }, 
        @{l = 'Text'; e = { $_.r[-1].t.'#text' } }, @{l = 'instr'; e = { $_.fldSimple.instr } }

    # Parse through each paragraph to match the caption to the image
    for ($i = 0; $i -lt $paragraphs.Count; $i++) {
        $capId = -1
        if ($paragraphs[$i].Id -gt 0 -and $paragraphs[$i].keepNext -eq $true) {
            $capId = $i + 1
        }
        elseif ($paragraphs[$i].Id -gt 0 -and $paragraphs[$i - 1].keepNext -eq $true) {
            $capId = $i - 1
        }

        if ($capId -gt -1) {
            $extractedfigures | Where-Object { $_.Id -eq $paragraphs[$i].Id } | ForEach-Object {
                $_.Figure = $paragraphs[$capId].CaptionId
                $_.Caption = "$($paragraphs[$capId].Prefix)$($paragraphs[$capId].CaptionId)$($paragraphs[$capId].Text)"
            }
        }
    }

    $extractedfigures | Select-Object Name, Figure, Caption, FullName
}

The this post of part of the series Automation Authoring. Refer the main article for more details on use cases and additional content in the series.

images | word

My personal collection of all things PowerShell and automation

Working with Images

Categories

Tags

Follow Me