We found ourselves with a requirement to download an updated version of a public dataset on a regular basis, so PowerShell + windows scheduler came to mind, since the application runs in a windows environment. But only to find that PowerShell doesn’t make this quite trivial.
In PowerShell v5+ we have the Expand-Archive command:
Expand-Archive c:\a.zip -DestinationPath c:\a
but this doesn’t support gzip or tar
gzip is a compression algorithm, and is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding. There’s a good comparison on popular compression algorithms worth checking out: https://stackoverflow.com/questions/28635496/difference-lz77-vs-lz4-vs-lz4hc-compression-algorithms
tar or tarball is an archive format, which allows multiple files to be grouped into one for backup or distribution purposes.
Combining the two, which is very common, let’s you download a single very well compressed archive containing multiple files and folders. But now we have a couple layers to deal with. Here are the steps I came up with:
Table of Contents
Create a clean temp folder
First we’ll delete any folder we plan to create (in case a previous run of this script failed in the middle), and then create our temp folder:
Remove-Item "c:\temp\maxmind\" -Filter * -Recurse -ErrorAction Ignore New-Item -ItemType directory -Path C:\temp\maxmind\
Download a file using PowerShell
The BitsTransfer cmdlet if available is really fast at downloading”
Import-Module BitsTransfer Start-BitsTransfer -Source "https://example.com/download.tar.gz" -Destination "c:\temp\maxmind\temp.tar.gz"
Unzipping a GZip with PowerShell
PowerShell doesn’t support gzip as far as I found, but we can make use of the .Net Framework through PowerShell, thanks to RiffyRiot on Technet https://social.technet.microsoft.com/Forums/windowsserver/en-US/5aa53fef-5229-4313-a035-8b3a38ab93f5/unzip-gz-files-using-powershell?forum=winserverpowershell
Function DeGZip-File{ Param( $infile, $outfile = ($infile -replace '\.gz$','') ) $input = New-Object System.IO.FileStream $inFile, ([IO.FileMode]::Open), ([IO.FileAccess]::Read), ([IO.FileShare]::Read) $output = New-Object System.IO.FileStream $outFile, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None) $gzipStream = New-Object System.IO.Compression.GzipStream $input, ([IO.Compression.CompressionMode]::Decompress) $buffer = New-Object byte[](1024) while($true){ $read = $gzipstream.Read($buffer, 0, 1024) if ($read -le 0){break} $output.Write($buffer, 0, $read) } $gzipStream.Close() $output.Close() $input.Close() } DeGZip-File "C:\temp\maxmind\temp.tar.gz" "C:\temp\maxmind\temp.tar"
Expand Tar archive with PowerShell
Finally, we have to extract the Tar, for which we can use the 7Zip4Powershell cmdlet:
if (-not (Get-Command Expand-7Zip -ErrorAction Ignore)) { Install-Package -Scope CurrentUser -Force 7Zip4PowerShell > $null } Expand-7Zip C:\temp\maxmind\temp.tar c:\temp\maxmind\
Find and copy the file we need, to our destination
$files=@("GeoLite2-City", "*.mmdb") Get-ChildItem -recurse "c:\temp\maxmind\" -include ($files) | Copy-Item -Destination (c:\data\GeoLite2-City.mmdb)
And finally, we clean up our temp folder
Remove-Item "c:\temp\maxmind\" -Filter * -Recurse -ErrorAction Ignore
Lastly, we wrap the whole thing into a powershell script, and change it to accept parameters for the url and output, and save it as DownloadAndExtract.ps1
Param ( [string] $url, [string] $output )
Now we schedule it in Windows Task Scheduler with a basic task
Then we set the schedule
And for our Action, we Start a Program with powershell
as the script, and the location of our ps1 script in the arguments:
Arguments: -file "C:\scripts\DownloadAndExtract.ps1" https://example.com/data.tar.gz c:\data\GeoLite2-City.mmdb
Troubleshooting
If Install-Package cannot be found: https://winaero.com/blog/fix-install-module-missing-powershell/