I began coding because…
I began learning Python a few years ago because it’s a good way to solve the kinds of problems I have to address at work (which include a host of data science-related and -adjacent tasks like data ingesting, wrangling, and modeling).
I actually took a programming class (Java) in college because I was a Math major and it was required. I waited until my last year of college to take the class — but I think if I had taken it sooner, I’d have switched majors from Math to Computer Science.
I’m currently hacking on…
At work, we are currently doing some interesting things with unstructured text data and natural language processing.
Outside of my day job, I work with the team at District Data Labs, an open source collaborative here in DC.
Right now we’re sprinting on Yellowbrick, a new open source pure Python library that extends the Scikit-Learn API with a visual transfomer (visualizer) that can incorporate visualizations of the model selection process into pipelines and modeling workflow.
I’m also working on an O’Reilly book called Applied Text Analysis with Python with my partners, Benjamin Bengfort and Tony Ojeda.
I’m excited about…
In general, I’m interested in open source development, which I’ve written about here, as well as machine learning and visual diagnostics.
In particular, Yellowbrick is a really interesting project because of the growing popularity of machine learning, and particularly with the Python library Scikit-Learn.
Recently, much of the machine learning workflow has been automated through grid search methods, standardized APIs, and GUI-based applications. In practice, however, human intuition and guidance can more effectively hone in on quality models than exhaustive search.
For instance, model selection is a lot more nuanced than simply picking the ‘right’ or ‘wrong’ algorithm. In practice, the workflow includes (1) selecting and/or engineering the smallest and most predictive feature set, (2) choosing a set of algorithms from a model family, and (3) tuning the algorithm hyperparameters to optimize performance.
Visual steering is therefore an increasingly useful tool for leveraging human visual pattern recognition along with automated tools and algorithm implementations. Yellowbrick’s visualizers can enable machine learning practitioners to visually interpret the model selection process, steer workflows toward more predictive models, and avoid common pitfalls and traps. For users, Yellowbrick can help evaluate the performance, stability, and predictive value of machine learning models, and assist in diagnosing problems throughout the machine learning workflow.
My advice for other women who code is…
This is a tricky one! We’re all different, and the strategies that work for me might not generalize very well.
For instance, for me, getting involved with open source development really expanded my world. On a practical level, it has really pushed me to be a better programmer. It’s also a space where (usually) the contributors are themselves users, meaning that contributions track closely with user stories, and are intrinsically (rather than extrinsically) motivated. I like that everyone has equal access to the code, and no one is excluded from making changes (at least locally). Contributor identities are open to the extent that a contributor wants to take credit for her work.
However, open source is a tricky space to maneuver. When people hear the term “open source”, I think they imagine it as a place that, by its very nature, is welcoming of diversity. But that’s not always the case. Open source can be very exclusive, especially compared to working in a traditional office place where there’s a whole HR department whose job it is to enforce some degree of diversity. Sometimes people can be mean.
With the Yellowbrick project and the other District Data Labs open source projects, we are actively working to attract more diverse contributors. I truly believe that the open source software we are building will be better and make a bigger impact if we have contributions from all kinds of people.
So I guess what I’m saying is whatever makes you different also makes you see the world in a novel way. As a developer, that means your difference is a gift, and we’d love to have you contribute to Yellowbrick (either via pull request or by becoming a tester ) or to another open source project.
I also really enjoy going to conferences like PyCon and PyData. These events tend to attract diverse, high-quality speakers and motivated contributors, and they also cultivate an environment where everyone feels equally approachable. It’s cool because as you move from session to session, instructors become audience members, audience members become presenters, and keynote speakers stop to catch up with you over coffee.
Finally, the teammates you pick really matter. I feel very lucky to have such good partners and colleagues — Ben, Tony, Allen, Nicole, Laura, and Will, just to name a few. You should be able to rely on your team not only for technical things but also for support in a broader sense — if anyone on your team is making you feel bad or dragging you down or not being supportive, take your awesomesocks skills over to the competition.
暂无评论内容