Politecnico di Milano
Harvard University
Data-Shack program 2016
Harvard-Politecnico di Milano Joint Program on Data Science for Sharing Economy

Intro

We are proposing a two-part research collaboration and innovative "hands-on" education experience involving students and faculty from the Institute for Applied Computational Science (IACS) at Harvard’s John A. Paulson School of Engineering and Applied Sciences (SEAS) and from the Master Courses in Computer Engineering and Design of Communication at Politecnico di Milano.

Eight students, four from Harvard and four from Politecnico di Milano, will join together to solve problems within the data science context. Under the supervision of Harvard and Politecnico di Milano faculty, this activity will give students the opportunity to work collaboratively with real world application.

Problems will cross the disciplines of data management, machine learning, data analysis, statistics and mathematics, data visualization and user experience design. Students will craft their solutions by developing the methodology, software, visualization, and high-performance elements, testing, completing the solution, and finally producing final papers that may be submitted for publication.

This joint venture will provide an exciting research opportunity for graduate students in data science to apply their work to the problems of society.

Participating institutions

Harvard University

At Harvard University, the Institute for Applied Computational Science (IACS) is the home for students and faculty who are tackling major challenges in science and the world through the use of computational methods. IACS trains graduate students to solve real-world problems and conduct innovative research by using mathematical models, algorithms, systems innovations and statistical tools. Embedded within a large liberal arts research University, IACS serves as the focal point for interdisciplinary collaborations in computational science at Harvard and the Boston area community.

The one-year master of science program, developed by IACS, provides students with a rigorous core curriculum equally balanced between computer science, applied math and statistics. This training, combined with the flexibility to explore elective topics, equips students to solve problems in whatever arena they choose to work in. Project-based courses complement the curriculum and provide students with practical experience in collaborative problem solving.


Politecnico di Milano

The master schools in computer engineering and communication design of Politecnico di Milano share the mission of creating qualified professionals, capable of understanding, monitoring and mastering the pressing needs of a continuously evolving society. Computer engineering and design offer masters programs producing several hundreds of top-quality graduates yearly.

The two schools are increasingly engaged in promoting interdisciplinary educational experiences with mixed teams of students; the interaction between engineers and designers produces very powerful forms of innovation, thanks to the mix of information technology and sound computational and mathematical foundations from one side, and creativity, design thinking and effective interaction design from the other. Both schools are engaged in new programs and courses centered on data science, which includes data-driven management, analysis, and visualization, with a problem-driven approach to training.

Context

We are flooded with data: currently tons of PBs (petabytes) of public data is electronically accessible, and this volume is growing. The availability of big data is bringing a paradigm shift in understanding public opinions, and planning and growing cities in which advances are becoming more and more data-driven. Government officials, city planners, statisticians, computer scientists, and engineers have begun collaborating to tackle these data science problems.

Scientists, in general, are increasingly recognizing the value of analyzing vast amounts of data to answer many interesting questions, such as:

  • What activities should be planned?
  • Can neighborhoods be classified and characterized automatically?
  • Can events, such as crime or increase in traffic, be predicted ahead of time so further resources are allocated?
  • Can we use other data sources to identify substandard rental conditions?
  • Can we use contextual data to analyze the heartbeat of a city?
  • Can we create databases fast enough to keep up with the stream of data?

All of these challenges require deep knowledge in statistics, machine learning and computer science. To achieve this, it is necessary to a) work in parallel on multiple facets of the problem and b) combine scientists who are specialized in different areas.

While we have been working on multiple projects addressing the questions above, we also believe that this new scientific paradigm requires a different type of scientist, the data scientist, who is multidisciplinary and has multiple skills ready to tackle these types of problems.

IACS has a strong program in data science, including: multiple courses in data science, a master’s program in computational science and engineering, a summer research experience program for undergraduates, hands-on workshops and a symposium on data science topics each January, a seminar series, and more. Politecnico di Milano has recently started an interdisciplinary Master’s program on big data management, and hosts several courses in data sciences, with an emphasis on social analytics and interaction design (see http://urbanscope.polimi.it).

Our international program will involve top-quality students in engineering and design, and will be an exciting, small-scale experiment that will be very beneficial for the participating students. The program will explore new directions for education in data science, by capitalizing on the strengths of the two institutions.

The development of these new methods benefits other fields beyond the scope of this project. Bio-Medicine, Molecular Genetics, Business Intelligence, etc. face similar challenges dealing with the rapidly increasing amount of available data. All of these fields have to build new tools that enable a deeper analysis of the information and that achieve suitable solutions for their scientific problems.

Problem definition: Sharing Economy and Social Analytics

Within the general context of data science, our program will focus on a specific problem area and will use a specific approach.

For what concerns the problem, students will focus on sharing economy, an emerging phenomenon in which citizens create and control markets of resources. The concept of the sharing economy starts from the global availability of information about resources (such as houses, cars, bikes, books) or activities (such as commuting) which can be shared and redistributed via an online marketplace so that their value may increase for the individuals involved, for the marketplace owners, and for society in general.

For what concerns the approach, students will study how to use social information, i.e. data extracted from social sources such as Facebook, Twitter, Instagram and other social networks, that provide powerful insights about the global sentiment of crowds concerning places, events, trends and "what is cool".

We will propose to students two problems

Price negotiation tool for Airbnb

Airbnb is a global marketplace of rentals of apartments that reach 190 countries and 34,000 cities. In Airbnb, citizens insert their rental offers and rent their own apartments to other citizens, thereby defining a parallel market to traditional offers based upon hotels.

We propose to integrate data from Airbnb with data from other sources, including open data, census information, real estate, information about the district, about the house interiors, social sources such as Instagram and Twitter, etc., so as to develop a new scoring system for Airbnb offers, similar to the hotel star system.

This information could be used to create a new business model; we will give this idea as an initial seed, and expect that the idea will evolve in many directions, following the student’s creativity. Students will be given total freedom to pursue alternatives and try different scenarios. The study will initially be confined to specific areas, most likely in Boston/Cambridge and Milan.

Kaggle for design

Kaggle is a platform that hosts machine learning competitions where companies and researchers post data and pose questions (challenges).

Data scientists from all over the world compete to answer the questions and to produce the best results, in effect, crowdsourcing the most efficient technique or solution to the questions. In most cases, there is more than one approach that can be taken to solve the problem. This is another kind of shared marketplace.

In the world of architecture, industrial design, and graphic design, it is common for designers to compete for projects. These competitions are posted on various web sites and the winner of the competition usually receives a prize, or sometimes, the winning design is actually implemented. Currently, there are web pages that consolidate design competitions but none of them host the competitions.

In this project, students will build a platform similar to Kaggle to host design competitions. Students will develop the platform, taking into consideration the following: how designs will be communicated and evaluated (possibly making use of social interaction), how privacy will be protected, and how designers and manufacturers operate.

Learning objectives

Our collaborative work and this proposal focus on two key areas:

  1. the development of methods for analyzing social media data.
  2. b. the development of software design, web design and business development.

Students will learn how to:

  1. Deal with data: choice, extraction, integration, visualization
  2. Choose the most appropriate data analysis method
  3. Build software and computational artifacts that are robust, reliable, and maintainable
  4. Communicate between different disciplines
  5. Work and collaborate in international teams

The application at Harvard will be open to all students in the Graduate School of Arts and Sciences who plan to take the AC 297r Capstone Course; the application at Polimi will be open to 2nd year Master’s students of Computer Science and Communication Design.

Timeline

  1. November 2015: Applications open simultaneously for students at Harvard and Polimi; students from both countries will apply to their home institution, and then all applications will be reviewed collaboratively between faculty at Harvard and Polimi.
  2. November 15: Application deadline.
  3. Late November: Students are notified of the decision.
  4. January 20-29, 2016:Polimi faculty and students travel to Cambridge to kick-off the project.
  5. March 11-19 2016: Harvard faculty and students travel to Milan.
  6. February - June 2016: Student teams interact to develop their project under the osmosis of the Harvard capstone course, jointly managed with Politecnico Faculty.
  7. June 2016 Final presentation of results take place simultaneously in Milan and Cambridge.

Application, Benefits and obligations

Politecnico di Milano

This program will select four students currently attending the second year of the Master (Laurea Magistrale) of Politecnico di Milano, two from Information Engineering and two from Communication Design.

Applicants should have obtained at least 40 CFU during their first year of Master studies, with an average grade above 25/30.

Applications must be sent by email by November 15 to Laura Caldirola (laura.caldirola@polimi.it).

The application must include as attachments:

  • an up-to-date record of exams, the English proficiency certificate, a CV (including a description of experiences of programming and of group projects)
  • a motivation letter explaining the candidate's interest in this educational opportunity
  • the preference between the two problems.

Students may be interviewed by the selection committee.

Selected students must attend the two full-immersion periods of January 20-29 at Harvard and of March 11-19 at Politecnico.

They will autonomously organize their travel to Harvard and their local accomodations; Politecnico will reimburse their expenses based on receipts, up to a total of 2250€ each. In addition to the full-immersion period, they are expected to work throughout the semester by using the format of Harvard's Capstone Course, by interacting with professors of Politecnico and Harvard University.

They are expected to deliver a public presentation of their results at the end of the program. Upon acceptance, they will sign a letter where they indicate that they agree and understand the obligations of the program; failure to comply will result in loosing their right to benefits.

Harvard University

Harvard students must be in a Harvard master’s or PhD program.

The Data-Shack program is being offered as a part of the AC 297r Capstone course, so students signing up for this program must commit to taking this course during the spring semester.

Students are required to be on campus and participate in program activities with peers from Politecnico from Wednesday, January 20 – Tuesday, January 28. (Note that this is prior to the official start of the spring semester).
Students must be available to travel to Milan during the week of spring break, Friday, March 11 – Sunday, March 20. Students must commit to collaborating with peers at Politechnico throughout the entire spring semester and will be expected to deliver a public presentation of their results at the end of the program.

The four students chosen to participate will be required to pay a $150 deposit to secure their spot in this program. IACS will cover the cost to travel to Milan.

A complete application includes submission of the form linked here and emailing your resume as a PDF to Sheila Coveney, IACS Program Manager, at coveney@seas.harvard.edu, before November 15.

Contacts

POLIMI Faculty


Harvard Faculty


  • Pavlos Protopapas

    Pavlos Protopapas

    Scientific Program Director and Lecturer, Institute for Applied Computational Science, Harvard John A. Paulson School of Engineering and Applied Sciences.
    pavlos@seas.harvard.edu
  • Hanspeter Pfister

    Hanspeter Pfister

    Professor of Computer Science, Harvard John A. Paulson School of Engineering and Applied Sciences.
    pfister@seas.harvard.edu
  • Jameson Rogers

    Jameson Rogers

    PhD, Strategy Consultant, Boston Consulting Group Research Fellow, Harvard Medical School.
    Jameson_Rogers@hms.harvard.edu
  • Viola Bentivogli

    Viola Bentivogli

    Italian registered Architect, Master degree in Architecture from Università degli studi di Ferrara.