Creating a plan, i.e., composing a sequence of items to achieve a task is inherently complex if done manually. This requires not only finding a sequence of relevant items but also understanding user requirements and incorporating them as constraints. For instance, in course planning, items are core and elective courses, and degree requirements capture their complex dependencies as constraints. In trip planning, items are points of interest (POIs) and constraints represent time and monetary budget, two user-specified requirements. Most importantly, a plan must comply with the ideal interleaving of items to achieve a goal such as enhancing students' skills towards the broader learning goal of an education program, or in the travel scenario, improving the overall user experience. We study the Task Planning Problem (TPP) with the goal of generating a sequence of items that optimizes multiple objectives while satisfying complex constraints. TPP is modeled as a Constrained Markov Decision Process, and we adapt weighted Reinforcement Learning to learn a policy that satisfies complex dependencies between items, user requirements, and satisfaction. We present a computational framework RL-Planner for TPP. RL-Planner requires minimal input from domain experts (academic advisors for courses, or travel agents for trips), yet produces personalized plans satisfying all constraints. We run extensive experiments on datasets from university programs and from travel agencies. We compare our solutions with plans drafted by human experts and with fully automated approaches. Our experiments corroborate that existing automated solutions are not suitable to solve TPP and that our plans are highly comparable to expensive handcrafted ones.