Looking Back on Chainer
This is a translation of my previous blog post. Thanks Hal Yoshizumi for kindly working on the translation!
A little over four years and eight months have passed since I committed the first Chainer code on April 12, 2015. I started to write the first lines with no particular motive, but it has become an exemplary framework now, supporting top-of-the-line research. The world of deep learning frameworks has undergone a total sea change during this period (we did not have either TensorFlow nor PyTorch at that time) We have come a long way.
PFN announced today that it would transition its main R&D framework to PyTorch. This, of course, marks a big turning point in the history of PFN, but even a greater turning point to me as I have devoted myself entirely to Chainer at the company.
First, frankly speaking, I really enjoyed developing Chainer. I began to write Chainer in the midst of fierce competition of deep learning frameworks. Many frameworks were sprouting up one after another like mushrooms after a rain, with TensorFlow and PyTorch joining the race later. I don’t know whether Chainer was on a par with such behemoths, but it was truly a stimulating environment. It was really invaluable experience for me to be able to develop a framework on the same ground as Google and Facebook at this time.
It is often said that when engineers start making a game, they usually end up creating just a game engine. I am the quintessence of this typical engineer (I was literally like this in my undergraduate days). In this sense, it was only natural for me to start writing a deep learning framework before I knew it. Deep learning frameworks make quite an interesting subject for someone like me. Designing the framework was exciting, because it made me feel as if I was producing a new language (DSL is much lighter than real programming language, though). Besides, it sometimes involves dealing with software at a very low level, so I found everything about the design work fun. In fact, it is not difficult to start writing a framework, thanks to the Python and NumPy ecosystem. You have every reason to write a framework. So, it’s no wonder that everyone’s competing in this free-for-all, so to speak.
I particularly like thinking about APIs, so that was basically my main job in the Chainer team. Words can’t describe how great I feel after having made a beautiful API and written simple and clear user code using the API I wrote. In this sense, I found the first week to be the most exciting. After that, I became engaged in various features, such as CuPy, Link/Chain, Trainer, FunctionNode, mixed precision operations, and ChainerX, all of which were not easy tasks but extremely rewarding and meaningful, as they fulfilled business requirements and contributed to the community of users outside PFN.
Although I had some experience in developing an OSS as I had previously been involved in Jubatus, it was a completely new experience to lead the Chainer team. There were tough times along the way. In retrospect, I think I have made many unorthodox moves. I can’t thank enough Unno-san, who is one of the initial developers on the team, for encouraging me to take on leadership on a number of occasions, because it made a huge difference in fostering my awareness, as well as building the team. And most of all, we wouldn’t be standing here were it not for the development members who willingly took ownership in tackling a wide range of issues.
I can keep on talking about the good old days, but digressions aside…
As for PFN’s decision, because it was something I raised in the first place, I have no regrets. Rather, I am feeling relieved that we have managed to come this far without an incident. I think I caused a lot of trouble to my colleagues by going on paternity leave at the very end, but they accepted like it was a perfectly normal thing to do. I really can’t express my appreciation enough.
This is how I feel at this moment as I write this blog. Having said that, I had mixed feelings like frustration and sadness, to be honest. There were even days I spent in distress when the migration talk began.
I might have had the feeling of frustration long before, if anything. Chainer was initially ahead of other major frameworks in terms of Define-by-Run. It was vexing to see PyTorch and TensorFlow dominating the user base and ecosystems after implementing similar features. If I list the things which I could’ve done much better, there’ll probably be no end to it. But, even if I were given a chance to turn the clock back and do it all over again, I am not entirely sure if I would take a different path. It’s hard.
In the meanwhile, it makes me feel proud that both PyTorch and TensorFlow are converging on what is quite similar to the API we have created on Chainer, in terms of how we write networks. With many lines of code appearing to have been written in a similar way, I feel some kind of meme is passing from Chainer to other frameworks. I am using PyTorch recently, but it gave me a sense of home despite the fact it is a new framework.
While it was fun to develop Chainer, other frameworks are moving in a similar direction and gradually becoming mature in a broad sense. Both TensorFlow and PyTorch are moving in the same direction - you write code with a Define-by-Run API, which is then converted to computational graph representation by a compiler, and code for the accelerator is generated after derivation of back propagation and graph level optimization. We were developing ChainerX and Chainer Compiler fundamentally based on the same idea, albeit a number of differences in details, toward the same goal. In this sense, framework APIs are converging, and small groups like us have almost fulfilled their roles in the advancement of frameworks. So, all in all, I think we made the right decision.
Of course, there is a fair chance that a major shift will occur in the near future. The field of deep learning shows no sign of abating. The all-too-confusing times may have passed when new architectures were emerging one after another, but it is still continuing to evolve on all levels ranging from basic modules to learning frames and tasks. We may not be writing programs with the same level of abstraction in the future. I am fantasizing that a new paradigm will rise suddenly into power on a different layer, changing the world again. We have pulled the plug on the Chainer development, but hopefully, I can find something fun to work on in a new field.
Last but not least, I want to thank the team members for developing Chainer together and making the work fun, Chainer family project members for helping expand the ecosystem, researchers and developers for giving us valuable feedback and PRs, public relations and corporate services for providing support whenever necessary, corporate officers for strongly backing the migration project, NVIDIA folks for having many discussions and contributing to CUDA related features, chug members for actively participating in the community activity, contributors for making direct contributions with issues and PRs, and most of all, all Chainer users. Thank you for everything.