I hope to god you’re not executing the subquery for each document. If you are then I see why there is quite a performance hit. You should only be executing the sub-query once and then referencing the results as the sub-query results do not change for each document.
Even though I have shown that composite indexes do not work properly in my previous thread: simple-user-search-takes-25-seconds-to-execute-on-large-cluster-with-fairly-small-data-set, I will give your suggestions a go tomorrow to see if there are any impacts on performance with explains attached.