People you can count on
绿帽社 students, faculty help each other with data science
鈥婲ancy Um, professor of art history, has everyone鈥檚 attention as she clicks through a website called Mapping Gothic France, which uses a database of 3D laser scans, images, texts and maps to explore the relationship of hundreds of examples of Gothic architecture in the 12th and 13th centuries in nascent France. It is a great example of how data can generate new knowledge and revitalize interest in the humanities, she says.
While this appears to be a typical lunchtime lecture 鈥 with PowerPoint and sandwiches 鈥 it鈥檚 actually called a Data Salon, and most of Um鈥檚 audience come from outside her department. There are engineers, mathematicians and computer scientists 鈥 people who, until recently, might not have ventured across campus to listen to an art historian. But now, because of data science, they are her colleagues.
At universities around the country, data science has spread far beyond its traditional homes in the mathematical sciences and computer science departments. Nearly every discipline has data waiting for discovery. At 绿帽社, the demand for training and scholarship has moved so fast that faculty, students and staff have begun grassroots efforts to build skills and share ideas.
The Data Salon is one example. It鈥檚 not a traditional departmental seminar; it鈥檚 a talk about a specific topic, followed by discussion that welcomes viewpoints from other disciplines, says Xingye Qiao, associate professor of mathematical sciences and an organizer of the Data Salon.
鈥淚t helps us understand where other people are coming from,鈥 he says.
Since summer 2017, there have been 13 Data Salons covering topics ranging from 鈥渟mart cities鈥 to machine learning to geographical information systems to gerrymandering.
Surprisingly, gerrymandering proved to be a popular topic.
鈥淚t was a mix of statistical thinking, computer programming and a real problem,鈥 Qiao says.
Qiao鈥檚 research focuses on statistical machine learning. Um鈥檚 specialty is Islamic art and architecture. Both are members of an approximately 18-member committee that has been working for the past few years to expand datascience programming at 绿帽社. The group hit a milestone last summer when it was officially recognized as the Data Science Transdisciplinary Area of Excellence (TAE), which is charged with establishing data-centered research across a variety of disciplines at 绿帽社. It is the sixth and newest TAE on campus; Qiao is its chair.
Qiao acknowledges that getting this far has required becoming comfortable with other ways of approaching scholarship. For example, not everyone knows statistics, so graduate students from math are offering statistical consulting to faculty and students on campus. It鈥檚 all about giving people who want to do data analysis the skills to get started. A workshop last year on scraping data from the web filled up fast.
鈥淲e try and take a very broad view and bring people together. It鈥檚 the only way as a new entity we can grow, to be more inclusive instead of saying, 鈥榊ou don鈥檛 belong here,鈥欌 he says.
Um says she鈥檚 never been on a committee with people from so many different schools. 鈥淚t鈥檚 been a path of discovery among the group about how to talk to each other.鈥
Over the course of her career, Um has gone from looking at paintings with a magnifying glass to teaching students how to use software.
鈥淚n graduate school I was never shown a spreadsheet, never taught how to use a data frame or how to write code in Python,鈥 she says.
Motivated by a movement called the digital humanities, which uses data science and myriad digital resources to mine new knowledge from classical humanistic disciplines, such as literature and philosophy, Um has spent summers learning new technology and software to wrangle, manage and visualize her data.
Yet, as she explains during her presentation, data science in the humanities can look very different from data science in the STEM disciplines.
For example, some scientists can readily tap computer-generated datasets to aid in their research, while a historian might start with a 17th-century manuscript from which certain information must be entered, by hand, into a spreadsheet or database. What to enter, how to enter it and what you are seeking to find out must all be decided ahead of time, as determined by the specific research questions being posed.
If your goal is to plot street addresses of sculptors in Paris in 1690, how will you provide exact geographical coordinates for approximations such as 鈥渁cross from the cathedral鈥 and 鈥渘ext to the cheesemaker?鈥
鈥淵ou always have to make choices,鈥 she says. 鈥淒ata is never neutral.鈥
Addressing issues that are particular to the humanities, Um says, is one of her commitments to the TAE group.
鈥淲e as humanists need to learn from people in math and computer science, and I鈥檓 learning every day. But I also think that people who are working at the technical end of data science have a lot to learn from the humanities about the social and human aspects of data,鈥 she says.
Qiao agrees. Teaching a freshman seminar called Data Science and Us has been an adventure in exploring the human context and ethics of data science, he says.
鈥淚 think we who are on the technical end are less aware of the environment that the data comes from, and we tend to oversimplify subject matters by mathematical models. I now pay more attention to the human/societal side of the research, which has helped me a lot.鈥
Students dive into data
When she was a junior, Lydia King 鈥18 started a club. Or maybe it was a movement. She was pursuing a double major in math and economics but was missing some basic technical skills, and she knew she wasn鈥檛 alone.
鈥淚n Harpur College, we were hearing a lot about concepts and quantitative theory,鈥 she says, 鈥渂ut we weren鈥檛 getting jobs.鈥
So, she founded the Data Science and Analytics Club for students who wanted to pick up skills such as Excel and some database languages but who lacked the time or comfort level to learn them in the classroom.
The pitch was that the skills would be taught by other students, and the tutorials would be free and easy. Initial interest was mostly from liberal arts majors.
鈥淎 lot of students who鈥檝e been left out of the STEM fields have anxiety walking into those classrooms. They feel they don鈥檛 belong,鈥 she says. And that can limit careers.
The club has proved popular. Membership this year is over 700 students, making it the largest club on campus by about 150 people.
鈥淥ur Python teacher is from computer science and our Excel teacher is from SOM,鈥 says Robert Valdez, club president and senior mechanical engineering major.
Students learn at their own pace. Some attend every tutorial and then they鈥檙e done, while others take the same tutorial again and again because they continue to learn from it. There鈥檚 no exam, and there is lots of support.
Those who are teaching may have picked up their knowledge in a class, from an internship or even in high school. Some are self-taught. For many, the cross-disciplinary makeup of the club is a place to explore ideas and learn how to communicate them.
鈥淎 lot of the students don鈥檛 want to be so technical with computer science; they don鈥檛 want to sit there and just be coding all day. They want to do something more practical, more applicable,鈥 Valdez says.
In March, 119 students participated in the club鈥檚 annual Datathon. Teams had 27 hours to complete a project based on a dataset.
鈥淎fter teaching the skills, it鈥檚 our way of giving students a project they can put on their r茅sum茅s,鈥 Valdez says.
Organize, analyze, visualize
Identifying, collecting and processing data inevitably leads to the question: Now what? How will the data be presented so that it can inform decisions, prompt questions or offer new insights?
Data visualization is a means for sharing information. It can be as simple as an Excel spreadsheet or as artful as an infographic. But in the end, the data needs to say something, and it needs to be visually coherent.
Last summer, data research analyst Zoraya Cruz-Bonilla 鈥05, MPA 鈥17, and assessment analyst Kirsten Pagan, MA 鈥10, both with the office of Student Affairs Assessment and Strategic Initiatives, participated in the 绿帽社 Libraries鈥 guest curator program. Their exhibit, called 鈥淒ata Visualization: Contributions and Insights from the 鈥楳useum of Cognitive Art,鈥欌 shows the history and transformation of data visualization and addresses the challenges of presenting data in a manner that is unbiased, accurate, ethical and accessible. It is across from the main desk of the Bartle Library through May.
There are facets to data visualization that people don鈥檛 consider, Cruz-Bonilla says. 鈥淭here is the hardcore, data analysis part and the artistic part, and it鈥檚 not one-size-fits-all.鈥
The exhibit was followed by three data visualization workshops, open to anyone on campus. Cruz-Bonilla taught 鈥淒esigning an Infographic鈥 in December. She used open-source data to give attendees some hands-on practice.
The aim of the workshops is to build confidence in those attending, she says.
鈥淚t might be easy to build a survey, but interpreting the data can be intimidating. How do you make sense out of it? We want to make sure people have the confidence to look at the data, digest it and put it into an easy format.鈥