I have finally put the refactored architecture into production today. In short this means the app can now much more easily be updated (e.g. interface improvements, symbols be added). You won’t notice anything of that just now, because the interface has not changed at all, yet.
Details - don’t read past this unless you are really interested
The former architecture was comprised of a web layer and backend that contained all the domain knowledge and the classifier. That had one very big disadvantage: The classifier had to be restarted and re-trained with all available data anytime new symbols would be added to the application. A similar problem would arise when the classifier would be updated, also resulting in the need to re-train it.
I have now moved the classifier out of the application and made it a standalone component that can only do this one thing: machine learning. I have implemented it as a web service that provides a REST api that allows training and classification of data. The repository can be found on github.
The other part of the application contains the web gui and the domain logic. Only this part knows what LaTeX symbols are and it utilizes a machine learning service for it’s machine learning needs. This is detexify - the application you interact with.
Detexify can be updated and restarted while the machine learning service is happily and unaffectedly running on it’s own server. Also when I find better algorithms for symbol recognition I can simply start another machine learning service using the new algorithms on a different machine and update detexify to simply use that other server. I can train that second service up with existing data before switching over and there is absolutely no downtime involved.
This makes maintenance a lot more comfortable.