One remarkable thing that easily deceives us with machine learning is that how quickly one can see good, encouraging results.
When developing a new algorithm during my time in grad school I often had some bug in the implementation. The results despite the bug were still good, but not as good as the final, bug free version (as far as we know). Despite bugs, the system was learning and doing something.
When training a new model using data from a system that has a bug we still get a working model. Removing the bugs will often result in sizable performance improvements of the model, more than adding a new feature would have.
When starting new projects in my experience we quickly see great results. Often the simplest thing gets us about 80% to where we would have a new product or a launch-able new model. Demos often look interesting, the system doing something right and a few funny cases wrong. Often people become excited and expectations rise for when this new thing may ship.
I feel it’s important to test carefully, have a metric and specific criteria for launch that measure performance against a ground truth. This may sound obvious, but many new techniques (GANs etc.) there’s often no ground truth and performance is eyeballed. Again, the first 80% are easy. The remaining percentage points will take a while…