As part of their training all medical students and residents have to pass basic surgical tasks such as knot tying, needle-passing, and suturing. Their assessment is typically performed in the operating room by surgical faculty where mistakes and failure by the student increases the operation time and cost. This evaluation is quantitative and has a low margin of error. Simulation has emerged as a cost effective option but it lacks assessment or requires additional expensive hardware for evaluation. Apps that provide training videos on surgical knot trying are available to students but none have evaluation. We propose a cascaded neural network architecture that evaluates a student's performance just from a video of themselves simulating a surgical knot tying task. Our model converts video frame images into feature vectors with a pre-trained deep convolutional network and then models the sequence of frames with a temporal network. We obtained videos of medical students and residents from the Robert Wood Johnson Hospital performing knot tying on a standardized simulation kit. We manually annotated each video and proceeded to do a five-fold cross-validation study on them. Our model achieves a median precision, recall, and F1-score of 0.71, 0.66, and 0.65 respectively in determining the level of knot related tasks of tying and pushing the knot. Our mean precision score averaged across different probability thresholds is 0.8. Both our F1-score and mean precision score are 8% and 30% higher than that of a recently published study for the same problem. We expect the accuracy of our model to further increase as we add more training videos to the model thus making it a practical solution that students can use to evaluate themselves.