5 min read Foreach itself is very useful and efficient for most operations. Sometimes special situations arise where high latency in getting data to iterate over, or processing data inside the foreach depends on an operation with very high latency or long processing. This is the case for example with getting paged data from a database to iterate over. The goal is to start getting data from the database, but a chunk of data at a time, since getting one record at a time introduces its own overhead. As the data becomes available, we’d start processing it, while in the background we get more data and feed it into the processor. The processing part would itself be parallel as well, and start processing the next iterator.
ForEachAsync
My favorite way to do this is with an extension method Stephen Toub wrote many years ago, that accepts a data generator and breaks the data source into partitions allowing for specifying the degree of parallelism and accepts a lambda to execute for each item
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
namespace Extensions
{
public static class Extensions
{
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
}
}
But let’s see what we can do to optimize more…