Generating contextually coherent responses has been one of the most critical challenges in building intelligent dialogue systems. Key issues are how to appropriately encode contexts and how to make good use of them during the generation. Past works either directly use (hierarchical) RNN to encode contexts or use attention-based variants to further weight different words and utterances. They tend to learn dispersed focuses over all contextual information, which contradicts the facts that humans tend to respond to certain concentrated semantics of contexts. This leads to the results that generated responses are only show semantically related to, but not precisely coherent with the given contexts. To this end, this paper proposes a contextually coherent dialogue generation (ConDial) method by first encoding contexts into structured semantic vectors using self-attention, and then adaptively choosing key semantic vectors to guide the response generation. Based on the structured semantics, it also develops a calibration mechanism with a dynamic vocabulary during decoding, which enhances exact coherent expressions by adjusting word distribution. According to the experiments, ConDial shows better generative performance than state-of-the-arts and is capable of generating responses that not only continue the topics but also keep coherent contextual expressions.