3 min read

We found ourselves with a requirement to download an updated version of a public dataset on a regular basis, so PowerShell + windows scheduler came to mind, since the application runs in a windows environment. But only to find that PowerShell doesn’t make this quite trivial.

 

In PowerShell v5+ we have the Expand-Archive command:

Expand-Archive c:\a.zip -DestinationPath c:\a

but this doesn’t support gzip or tar

 

gzip is a compression algorithm, and is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding. There’s a good comparison on popular compression algorithms worth checking out: https://stackoverflow.com/questions/28635496/difference-lz77-vs-lz4-vs-lz4hc-compression-algorithms¬†

tar or tarball is an archive format, which allows multiple files to be grouped into one for backup or distribution purposes.

 

Combining the two, which is very common, let’s you download a single very well compressed archive containing multiple files and folders. But now we have a couple layers to deal with. Here are the steps I came up with:

 

Create a clean temp folder

First we’ll delete any folder we plan¬† to create (in case a previous run of this script failed in the middle), and then create our temp folder:

Remove-Item "c:\temp\maxmind\" -Filter * -Recurse -ErrorAction Ignore
New-Item -ItemType directory -Path C:\temp\maxmind\

 

Download a file using PowerShell

The BitsTransfer cmdlet if available is really fast at downloading”

Import-Module BitsTransfer
Start-BitsTransfer -Source "https://example.com/download.tar.gz" -Destination "c:\temp\maxmind\temp.tar.gz"

 

Unzipping a GZip with PowerShell

PowerShell doesn’t support gzip as far as I found, but we can make use of the .Net Framework through PowerShell, thanks to RiffyRiot on Technet https://social.technet.microsoft.com/Forums/windowsserver/en-US/5aa53fef-5229-4313-a035-8b3a38ab93f5/unzip-gz-files-using-powershell?forum=winserverpowershell

Function DeGZip-File{
    Param(
        $infile,
        $outfile = ($infile -replace '\.gz$','')
        )

    $input = New-Object System.IO.FileStream $inFile, ([IO.FileMode]::Open), ([IO.FileAccess]::Read), ([IO.FileShare]::Read)
    $output = New-Object System.IO.FileStream $outFile, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
    $gzipStream = New-Object System.IO.Compression.GzipStream $input, ([IO.Compression.CompressionMode]::Decompress)

    $buffer = New-Object byte[](1024)
    while($true){
        $read = $gzipstream.Read($buffer, 0, 1024)
        if ($read -le 0){break}
        $output.Write($buffer, 0, $read)
        }

    $gzipStream.Close()
    $output.Close()
    $input.Close()
}

DeGZip-File "C:\temp\maxmind\temp.tar.gz" "C:\temp\maxmind\temp.tar"

 

Expand Tar archive with PowerShell

Finally, we have to extract the Tar, for which we can use the 7Zip4Powershell cmdlet:

if (-not (Get-Command Expand-7Zip -ErrorAction Ignore)) {
  Install-Package -Scope CurrentUser -Force 7Zip4PowerShell > $null
}
Expand-7Zip C:\temp\maxmind\temp.tar c:\temp\maxmind\

 

Find and copy the file we need, to our destination

[email protected]("GeoLite2-City", "*.mmdb")
Get-ChildItem -recurse "c:\temp\maxmind\" -include ($files) | Copy-Item -Destination (c:\data\GeoLite2-City.mmdb)

 

And finally, we clean up our temp folder

Remove-Item "c:\temp\maxmind\" -Filter * -Recurse -ErrorAction Ignore

 

Lastly, we wrap the whole thing into a powershell script, and change it to accept parameters for the url and output, and save it as DownloadAndExtract.ps1

Param 
( 
  [string] 
  $url,
  [string] 
  $output
)

 

Now we schedule it in Windows Task Scheduler with a basic task

Then we set the schedule

And for our Action, we Start a Program with powershell as the script, and the location of our ps1 script in the arguments:

Arguments: -file "C:\scripts\DownloadAndExtract.ps1" https://example.com/data.tar.gz c:\data\GeoLite2-City.mmdb

 

Troubleshooting

If Install-Package cannot be found: https://winaero.com/blog/fix-install-module-missing-powershell/

Was this post helpful?