Consider a message-passing system of n processors each holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n distributed pieces of data and to make the result known to all the processors. We model the message-passing system using the parameter k for the k-port model and the parameter λ for the communication latency in the postal model. In this general model, each processor during each round r can send messages to any set of k processors and receive messages from any other set of k processors, which were sent out during round r-λ+1, provided r-λ+1≥1. We describe an optimal algorithm that requires the least number of communication rounds and minimizes the time spent by each processor in sending and receiving messages.