We found ourselves with a requirement to download an updated version of a public dataset on a regular basis, so PowerShell + windows scheduler came to mind, since the application runs in a windows environment. But only to find that PowerShell doesn’t make this quite trivial.
In PowerShell v5+ we have the Expand-Archive command:
Expand-Archive c:\a.zip -DestinationPath c:\a
but this doesn’t support gzip or tar
gzip
is a compression algorithm, and is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding. There’s a good comparison on popular compression algorithms worth checking out: https://stackoverflow.com/questions/28635496/difference-lz77-vs-lz4-vs-lz4hc-compression-algorithms
tar
or tarball
is an archive format, which allows multiple files to be grouped into one for backup or distribution purposes.
Combining the two, which is very common, let’s you download a single very well compressed archive containing multiple files and folders. But now we have a couple layers to deal with. Here are the steps I came up with:
First we’ll delete any folder we plan to create (in case a previous run of this script failed in the middle), and then create our temp folder:
Remove-Item "c:\temp\maxmind\" -Filter * -Recurse -ErrorAction Ignore New-Item -ItemType directory -Path C:\temp\maxmind\
The BitsTransfer cmdlet if available is really fast at downloading”
Import-Module BitsTransfer Start-BitsTransfer -Source "https://example.com/download.tar.gz" -Destination "c:\temp\maxmind\temp.tar.gz"
PowerShell doesn’t support gzip as far as I found, but we can make use of the .Net Framework through PowerShell, thanks to RiffyRiot on Technet https://social.technet.microsoft.com/Forums/windowsserver/en-US/5aa53fef-5229-4313-a035-8b3a38ab93f5/unzip-gz-files-using-powershell?forum=winserverpowershell
Function DeGZip-File{ Param( $infile, $outfile = ($infile -replace '\.gz$','') ) $input = New-Object System.IO.FileStream $inFile, ([IO.FileMode]::Open), ([IO.FileAccess]::Read), ([IO.FileShare]::Read) $output = New-Object System.IO.FileStream $outFile, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None) $gzipStream = New-Object System.IO.Compression.GzipStream $input, ([IO.Compression.CompressionMode]::Decompress) $buffer = New-Object byte[](1024) while($true){ $read = $gzipstream.Read($buffer, 0, 1024) if ($read -le 0){break} $output.Write($buffer, 0, $read) } $gzipStream.Close() $output.Close() $input.Close() } DeGZip-File "C:\temp\maxmind\temp.tar.gz" "C:\temp\maxmind\temp.tar"
Finally, we have to extract the Tar, for which we can use the 7Zip4Powershell cmdlet:
if (-not (Get-Command Expand-7Zip -ErrorAction Ignore)) { Install-Package -Scope CurrentUser -Force 7Zip4PowerShell > $null } Expand-7Zip C:\temp\maxmind\temp.tar c:\temp\maxmind\
Find and copy the file we need, to our destination
$files=@("GeoLite2-City", "*.mmdb") Get-ChildItem -recurse "c:\temp\maxmind\" -include ($files) | Copy-Item -Destination (c:\data\GeoLite2-City.mmdb)
And finally, we clean up our temp folder
Remove-Item "c:\temp\maxmind\" -Filter * -Recurse -ErrorAction Ignore
Lastly, we wrap the whole thing into a powershell script, and change it to accept parameters for the url and output, and save it as DownloadAndExtract.ps1
Param ( [string] $url, [string] $output )
Now we schedule it in Windows Task Scheduler with a basic task
Then we set the schedule
And for our Action, we Start a Program with powershell
as the script, and the location of our ps1 script in the arguments:
Arguments: -file "C:\scripts\DownloadAndExtract.ps1" https://example.com/data.tar.gz c:\data\GeoLite2-City.mmdb
If Install-Package cannot be found: https://winaero.com/blog/fix-install-module-missing-powershell/
Quick Links
Legal Stuff