Politecnico di Milano
Harvard University
Data-Shack program 2017
Harvard-Politecnico di Milano Joint Program on Data Science

Intro

We are proposing a two-part research collaboration and innovative "hands-on" education experience involving students and faculty from the Institute for Applied Computational Science (IACS) at Harvard’s John A. Paulson School of Engineering and Applied Sciences (SEAS) and from the Master Courses in Computer Engineering and Design of Communication at Politecnico di Milano.

Eight students, four from Harvard and four from Politecnico di Milano, will join together to solve problems within the data science context. Under the supervision of Harvard and Politecnico di Milano faculty, this activity will give students the opportunity to work collaboratively with real world application.

Problems will cross the disciplines of data management, machine learning, data analysis, statistics and mathematics, data visualization and user experience design. Students will craft their solutions by developing the methodology, software, visualization, and high-performance elements, testing, completing the solution, and finally producing final papers that may be submitted for publication.

This joint venture will provide an exciting research opportunity for graduate students in data science to apply their work to the problems of society.

Participating institutions

Harvard University

At Harvard University, the Institute for Applied Computational Science (IACS) is the home for students and faculty who are tackling major challenges in science and the world through the use of computational methods. IACS trains graduate students to solve real-world problems and conduct innovative research by using mathematical models, algorithms, systems innovations and statistical tools. Embedded within a large liberal arts research University, IACS serves as the focal point for interdisciplinary collaborations in computational science at Harvard and the Boston area community.

The one-year master of science program, developed by IACS, provides students with a rigorous core curriculum equally balanced between computer science, applied math and statistics. This training, combined with the flexibility to explore elective topics, equips students to solve problems in whatever arena they choose to work in. Project-based courses complement the curriculum and provide students with practical experience in collaborative problem solving.


Politecnico di Milano

The master schools in computer engineering and communication design of Politecnico di Milano share the mission of creating qualified professionals, capable of understanding, monitoring and mastering the pressing needs of a continuously evolving society. Computer engineering and design offer masters programs that produce several hundreds of top-quality graduates yearly.

The two schools are increasingly engaged in promoting interdisciplinary educational experiences with mixed teams of students; the interaction between engineers and designers produces very powerful forms of innovation, thanks to the mix of information technology and sound computational and mathematical foundations from one side, and creativity, design thinking and effective interaction design from the other. Both schools are engaged in new programs and courses centered on data science, which includes data-driven management, analysis, and visualization, with a problem-solving approach to training.

Context

We are flooded with data: currently tons of PBs (petabytes) of public data is electronically accessible, and this volume is growing. The availability of big data is bringing a paradigm shift in understanding public opinions, and planning and growing cities in which advances are becoming more and more data-driven. Government officials, city planners, statisticians, computer scientists, and engineers have begun collaborating to tackle these data science problems.

Scientists, in general, are increasingly recognizing the value of analyzing vast amounts of data to answer many interesting questions, such as:

  • What activities should be planned?
  • Can neighborhoods be classified and characterized automatically?
  • Can events, such as crime or increase in traffic, be predicted ahead of time so further resources are allocated?
  • Can we use other data sources to identify substandard rental conditions?
  • Can we use contextual data to analyze the heartbeat of a city?
  • Can we create databases fast enough to keep up with the stream of data?

All of these challenges require deep knowledge in statistics, machine learning and computer science. To achieve this, it is necessary to a) work in parallel on multiple facets of the problem and b) combine scientists who are specialized in different areas.

While we have been working on multiple projects addressing the questions above, we also believe that this new scientific paradigm requires a different type of scientist, the data scientist, who is multidisciplinary and has multiple skills ready to tackle these types of problems.

IACS has a strong program in data science, including: multiple courses in data science, a master’s program in computational science and engineering, a summer research experience program for undergraduates, hands-on workshops and a symposium on data science topics each January, a seminar series, and more. Politecnico di Milano has recently started an interdisciplinary Master’s program on big data management, and hosts several courses in data sciences, with an emphasis on social analytics and interaction design (see http://urbanscope.polimi.it).

Our international program will involve top-quality students in engineering and design, and will be an exciting, small-scale experiment that will be very beneficial for the participating students. The program will explore new directions for education in data science, by capitalizing on the strengths of the two institutions.

The development of these new methods benefits other fields beyond the scope of this project. Bio-Medicine, Molecular Genetics, Business Intelligence, etc. face similar challenges dealing with the rapidly increasing amount of available data. All of these fields have to build new tools that enable a deeper analysis of the information and that achieve suitable solutions for their scientific problems.

DataShack 2017 projects

Como

Como

Cities have the challenge of correctly understanding how the urban environment and services are actually consumed by citizens.

This is relevant for tactical decisions regarding how, where and when to program events, happenings, shows, and commercial events in the city, as well as for improving commercial offerings (in terms of positioning of advertising, stores, and services) and for long-term urban planning strategies.

The Como project will focus on the city of Como, a small medieval town beautifully located on Lake Como in Northern Italy, with a large walking area in the downtown district and along the lakeshore. The project consists of collecting and analyzing data about the city and the way people live and move in it, by integrating multiple and diverse data sources. The inputs to be considered include: social media data (possibly geo-located), mobile phone data (in terms of phone calls and mobile internet connections to geo-located mobile network towers), city-wide wifi logs, information about parking occupancy, data from IoT sensors that measure the actual number of people walking through some specific sections of the city streets, and so on.

The problems to be addressed are:

  • providing a reliable estimate of the overall picture of human density and movements within the city
  • predicting the impact of hypothetical future events positioned in time and space, on the behavior and movement of people
  • given a constrained budget and a cost model for IoT sensors deployment, determining the best allocation of sensors for getting an optimized understanding of the human flows within the city
  • defining a concept of user experience and interaction for exploring the city data

Moleskine

MOLESKINE

The Moleskine Data-Shack project will explore the popularity and success of different Moleskine products co-branded with other famous brands (also known as special editions) and launched in specific periods in time. The main field of analysis is the impact that different products have on social media channels. Social media analysis then will be correlated with product distribution and sales performance data, along multiple dimensions (temporal, geographical, etc.) and product features.

The input will be social media data collected from social networks, online communities and blogs related to the Moleskine brand (especially with regard to its special editions); quantitative information about sell-in (with geographical and temporal distribution); and third-party demographic data describing the potential customer base.

The analysis will focus on the observation of special edition launches through their impact on social media and their actual market performance.

The final goal is to create a predictive model for new product launches depending on product features, customer demographics and their behavioral patterns.

Learning objectives

Our collaborative work and this proposal focus on two key areas:

  1. the development of methods for analyzing social media data
  2. the development of software design, web design and business development

Students will learn how to:

  1. Deal with data: choice, extraction, integration, visualization.
  2. Choose the most appropriate data analysis method.
  3. Build software and computational artifacts that are robust, reliable, and maintainable.
  4. Communicate between different disciplines.
  5. Work and collaborate in international teams.

The application at Harvard will be open to all students in the Graduate School of Arts and Sciences who plan to take the AC 297r Capstone Course; the application at Polimi will be open to 2nd year Master’s students of Computer Science and Communication Design.

Tentative timeline

  1. November 2016: Applications open simultaneously for students at Harvard and Polimi; students from both countries will apply to their home institution, and then all applications will be reviewed collaboratively between faculty at Harvard and Polimi.
  2. Late November - early December: Application deadline.
  3. Beginning - mid December: Students are notified of the decision.
  4. January 17-27, 2017: 1. Polimi faculty and students travel to Cambridge to kick-off the project in Harvard university.
  5. March 10-19 2017: 1. Harvard faculty and students travel to Milan.
  6. February - May 2017: Student teams interact to develop their project under the osmosis of the Harvard capstone course, jointly managed with Politecnico Faculty.
  7. May 2017 Final presentation of results take place simultaneously in Milan and Cambridge.

Application, Benefits and obligations

Politecnico di Milano

This program will select four students currently attending the second year of the Master (Laurea Magistrale) of Politecnico di Milano, two from Information Engineering and two from Communication Design.

Applicants should have obtained at least 40 CFU during their first year of Master studies, with an average grade above 25/30.

Applications must be sent by email by November 20 to Laura Caldirola (laura.caldirola@polimi.it).

The application must include as attachments:

  • an up-to-date record of exams, the English proficiency certificate, a CV (including a description of experiences of programming and of group projects)
  • a motivation letter explaining the candidate's interest in this educational opportunity
  • the preference between the two problems.

Students may be interviewed by the selection committee.

Selected students must attend the two full-immersion periods of January 17-27 at Harvard and of March 10-19 at Politecnico.

They will autonomously organize their travel to Harvard and their local accomodations; Politecnico will reimburse their expenses based on receipts, up to a total of 2250€ each. In addition to the full-immersion period, they are expected to work throughout the semester by using the format of Harvard's Capstone Course, by interacting with professors of Politecnico and Harvard University.

They are expected to deliver a public presentation of their results at the end of the program. Upon acceptance, they will sign a letter where they indicate that they agree and understand the obligations of the program; failure to comply will result in loosing their right to benefits.

Harvard University

Harvard students must be in a Harvard master’s or PhD program.

The Data-Shack program is being offered as a part of the AC 297r Capstone course, so students signing up for this program must commit to taking this course during the spring semester.

Students are required to be on campus and participate in program activities with peers from Politecnico from Tuesday, January 17 – Friday, January 27 (Note that this is prior to the official start of the spring semester).
Students must be available to travel to Milan during the week of spring break,Friday, March 10 – Sunday, March 19. Students must commit to collaborating with peers at Politechnico throughout the entire spring semester and will be expected to deliver a public presentation of their results at the end of the program.

Students are required to be on campus and participate in program activities with peers from Politecnico from Tuesday, January 17 – Wednesday, January 25. (Note that this is prior to the official start of the spring semester).
Students must be available to travel to Milan during the week of spring break, Saturday, March 11 – Sunday, March 19. Students must commit to collaborating with peers at Politechnico throughout the entire spring semester and will be expected to deliver a public presentation of their results at the end of the program.

The four students chosen to participate will be required to pay a $150 deposit to secure their spot in this program. IACS will cover airfare, double occupancy hotel and some meals. Students are required to pay for some meals and additional expenses.

A complete application includes submission of the form linked here and emailing your resume as a PDF to Sheila Coveney, IACS Program Manager, at coveney@seas.harvard.edu, before December 7.

Contacts

POLIMI Faculty


Harvard Faculty


  • Pavlos Protopapas

    Pavlos Protopapas

    Scientific Program Director and Lecturer, Institute for Applied Computational Science, Harvard John A. Paulson School of Engineering and Applied Sciences.
    pavlos@seas.harvard.edu
  • Hanspeter Pfister

    Hanspeter Pfister

    Professor of Computer Science, Harvard John A. Paulson School of Engineering and Applied Sciences.
    pfister@seas.harvard.edu