Politecnico di Milano
Harvard University
Data-Shack program 2018
Harvard-Politecnico di Milano Joint Program on Data Science

Intro

We are proposing a two-part research collaboration and innovative "hands-on" education experience involving students and faculty from the Institute for Applied Computational Science (IACS) at Harvard’s John A. Paulson School of Engineering and Applied Sciences (SEAS) and from the Master Courses in Computer Engineering and Design of Communication at Politecnico di Milano.

Eight students, four from Harvard and four from Politecnico di Milano, will join together to solve problems within the data science context. Under the supervision of Harvard and Politecnico di Milano faculty, this activity will give students the opportunity to work collaboratively with real world application.

Problems will cross the disciplines of data management, machine learning, data analysis, statistics and mathematics, data visualization and user experience design. Students will craft their solutions by developing the methodology, software, visualization, and high-performance elements, testing, completing the solution, and finally producing final papers that may be submitted for publication.

This joint venture will provide an exciting research opportunity for graduate students in data science to apply their work to the problems of society.

Participating institutions

Harvard University

At Harvard University, the Institute for Applied Computational Science (IACS) is the home for students and faculty who are tackling major challenges in science and the world through the use of computational methods. IACS trains graduate students to solve real-world problems and conduct innovative research by using mathematical models, algorithms, systems innovations and statistical tools. Embedded within a large liberal arts research University, IACS serves as the focal point for interdisciplinary collaborations in computational science at Harvard and the Boston area community.

The one-year master of science program, developed by IACS, provides students with a rigorous core curriculum equally balanced between computer science, applied math and statistics. This training, combined with the flexibility to explore elective topics, equips students to solve problems in whatever arena they choose to work in. Project-based courses complement the curriculum and provide students with practical experience in collaborative problem solving.


Politecnico di Milano

The master schools in computer engineering and communication design of Politecnico di Milano share the mission of creating qualified professionals, capable of understanding, monitoring and mastering the pressing needs of a continuously evolving society. Computer engineering and design offer masters programs that produce several hundreds of top-quality graduates yearly.

The two schools are increasingly engaged in promoting interdisciplinary educational experiences with mixed teams of students; the interaction between engineers and designers produces very powerful forms of innovation, thanks to the mix of information technology and sound computational and mathematical foundations from one side, and creativity, design thinking and effective interaction design from the other. Both schools are engaged in new programs and courses centered on data science, which includes data-driven management, analysis, and visualization, with a problem-solving approach to training.

Context

We are flooded with data: currently tons of PBs (petabytes) of public data is electronically accessible, and this volume is growing. The availability of big data is bringing a paradigm shift in understanding public opinions, and planning and growing cities in which advances are becoming more and more data-driven. Government officials, city planners, statisticians, computer scientists, and engineers have begun collaborating to tackle these data science problems.

Scientists, in general, are increasingly recognizing the value of analyzing vast amounts of data to answer many interesting questions, such as:

  • What activities should be planned?
  • Can neighborhoods be classified and characterized automatically?
  • Can events, such as crime or increase in traffic, be predicted ahead of time so further resources are allocated?
  • Can we use other data sources to identify substandard rental conditions?
  • Can we use contextual data to analyze the heartbeat of a city?
  • Can we create databases fast enough to keep up with the stream of data?

All of these challenges require deep knowledge in statistics, machine learning and computer science. To achieve this, it is necessary to a) work in parallel on multiple facets of the problem and b) combine scientists who are specialized in different areas.

While we have been working on multiple projects addressing the questions above, we also believe that this new scientific paradigm requires a different type of scientist, the data scientist, who is multidisciplinary and has multiple skills ready to tackle these types of problems.

IACS has a strong program in data science, including: multiple courses in data science, a master’s program in computational science and engineering, a summer research experience program for undergraduates, hands-on workshops and a symposium on data science topics each January, a seminar series, and more. Politecnico di Milano has recently started an interdisciplinary Master’s program on big data management, and hosts several courses in data sciences, with an emphasis on social analytics and interaction design (see http://urbanscope.polimi.it).

Our international program will involve top-quality students in engineering and design, and will be an exciting, small-scale experiment that will be very beneficial for the participating students. The program will explore new directions for education in data science, by capitalizing on the strengths of the two institutions.

The development of these new methods benefits other fields beyond the scope of this project. Bio-Medicine, Molecular Genetics, Business Intelligence, etc. face similar challenges dealing with the rapidly increasing amount of available data. All of these fields have to build new tools that enable a deeper analysis of the information and that achieve suitable solutions for their scientific problems.

DataShack 2018 Projects

Social Analytics in the Design World


Le Sirenuse

LE SIRENUSE

Le Sirenuse opened in 1951, when four Neapolitan brothers turned their summer house by the Amalfi Coast into a charming hotel, overlooking the bay of Positano. It is today considered one of Italy’s leading seaside luxury hotels and its intimate and refined Mediterranean atmosphere have been replicated in the Miami property, with an East Coast twist. Le Sirenuse brand extension also includes a line of beachwear and womenswear inspired by Positano and sold at the Emporio Sirenuse boutique and in selected stores worldwide.

Being an iconic example of Italian niche but worldwide powerful brand, Le Sirenuse fashion brand is an ideal case to explore the potential of digital and social media strategies, where big data analytics can turn small stories into global success.

The students challenge is to explore the popularity and success of Le Sirenuse fashion collection through social media analytics, identifying the on line communities of customers and brands (influencers competitors), as well as emerging topics relevant to the brand. A special focus will be mapping and visualizing Le Sirenuse fashion brand’s competitors and their related social media networks.

The final goal is to build a tool to analyse and visualize in real time Le Sirenuse competitors eco-system by multiple dimensions (locations, products, pricing, connected influencers, etc.), both by semantic and images analysis. The project will be run in collaboraton with Wemanage, a consulting firm in the fashion field.

TribeDynamics

TRIBE DYNAMICS

Tribe Dynamics is a San Francisco-based startup that measures social media engagement for beauty, fashion, and lifestyle brands. Online content creation led by digital influencers (beauty and fashion bloggers) is one of the key predictors of offline revenue in this industry.

The company provides software, data, and consulting services to some of the most well-known brands, private equity firms and Fortune 500 companies in the space. Tribe’s proprietary data is also regularly featured in leading industry research through sveral specialized publications.

Tribe’s training data set is mostly in English because it was originally started by tracking brands in the U.S. As the company expands internationally to Europe and Asia, it must be able to build the same classification tools for other languages. The research questions concern how the latest techniques in transfer learning and NLP can improve text classification accuracy across different languages.

Among possible solutions, the company would like to explore language-specific classifiers, or the use of translated English datasets / English trained classifiers, or transfer learning, or use of mixed language text and informal language found on social media.

Learning objectives

Our collaborative work and this proposal focus on two key areas:

  1. the development of methods for analyzing social media data
  2. the development of software design, web design and business development

Students will learn how to:

  1. Deal with data: choice, extraction, integration, visualization.
  2. Choose the most appropriate data analysis method.
  3. Build software and computational artifacts that are robust, reliable, and maintainable.
  4. Communicate between different disciplines.
  5. Work and collaborate in international teams.

The application at Harvard will be open to all students in the Graduate School of Arts and Sciences who plan to take the AC 297r Capstone Course; the application at Polimi will be open to 2nd year Master’s students of Computer Science and Communication Design.

Tentative timeline

  1. November, 7th 2017: Applications open simultaneously for students at Harvard and Polimi; students from both countries will apply to their home institution, and then all applications will be reviewed collaboratively between faculty at Harvard and Polimi.
  2. November 24th: Application deadline.
  3. December 5th: Students are notified of the decision.
  4. January 18-28, 2018: Polimi faculty and students travel to Cambridge to kick-off the project in Harvard university.
  5. March 10-19 2018: Harvard faculty and students travel to Milan.
  6. February - May 2018: Student teams interact to develop their project under the osmosis of the Harvard capstone course, jointly managed with Politecnico Faculty.
  7. May 2018 Final presentation of results take place simultaneously in Milan and Cambridge.

Application, Benefits and obligations

Politecnico di Milano

This program will select four students currently attending the second year of the Master (Laurea Magistrale) of Politecnico di Milano, two from Information Engineering and two from Communication Design.

Applicants should have obtained at least 40 CFU during their first year of Master studies, with an average grade above 25/30.

Applications must be sent by email by November 24th to Laura Caldirola (laura.caldirola@polimi.it).

The application must include as attachments:

  • an up-to-date record of exams, the English proficiency certificate, a CV (including a description of experiences of programming and of group projects)
  • a motivation letter explaining the candidate's interest in this educational opportunity
  • the preference between the two problems.

Students may be interviewed by the selection committee.

Selected students must attend the two full-immersion periods of January 17-27 at Harvard and of March 10-19 at Politecnico.

They will autonomously organize their travel to Harvard and their local accomodations; Politecnico will reimburse their expenses based on receipts. In addition to the full-immersion period, they are expected to work throughout the semester by using the format of Harvard's Capstone Course, by interacting with professors of Politecnico and Harvard University.

They are expected to deliver a public presentation of their results at the end of the program. Upon acceptance, they will sign a letter where they indicate that they agree and understand the obligations of the program; failure to comply will result in loosing their right to benefits.

Harvard University

Harvard students must be in a Harvard master’s or PhD program.

The Data-Shack program is being offered as a part of the AC 297r Capstone course, so students signing up for this program must commit to taking this course during the spring semester.

Students are required to be on campus and participate in program activities with peers from Politecnico from January 18 to January 28 (Note that this is prior to the official start of the spring semester).
Students must be available to travel to Milan during the week of spring break,from March 10 to March 19. Students must commit to collaborating with peers at Politechnico throughout the entire spring semester and will be expected to deliver a public presentation of their results at the end of the program.

Students are required to be on campus and participate in program activities with peers from Politecnico from January 19 to January 28. (Note that this is prior to the official start of the spring semester).
Students must be available to travel to Milan during the week of spring break, March 10 – March 19. Students must commit to collaborating with peers at Politechnico throughout the entire spring semester and will be expected to deliver a public presentation of their results at the end of the program.

The four students chosen to participate will be required to pay a $150 deposit to secure their spot in this program. IACS will cover airfare, double occupancy hotel and some meals. Students are required to pay for some meals and additional expenses.

A complete application includes submission of the form linked here and emailing your resume as a PDF to Sheila Coveney, IACS Program Manager, at coveney@seas.harvard.edu, before December 7.

Contacts

POLIMI Faculty


Harvard Faculty


  • Pavlos Protopapas

    Pavlos Protopapas

    Scientific Program Director and Lecturer, Institute for Applied Computational Science, Harvard John A. Paulson School of Engineering and Applied Sciences.
    pavlos@seas.harvard.edu
  • Hanspeter Pfister

    Hanspeter Pfister

    Professor of Computer Science, Harvard John A. Paulson School of Engineering and Applied Sciences.
    pfister@seas.harvard.edu

POLIMI Tutor


Harvard Tutor