Blight Fight

Recently I’ve finished the Capstone Project of Data Science at Scale Specialization provided by University of Washington. I’d like to share my solution here.

Course Certificate

Task

Work with real data collected in Detroit to help urban planners predict blight (the deterioration and decay of buildings and older areas of large cities, due to neglect, crime, or lack of economic support).

My Work

Please refer to the solution overview and final report on GitHub. You can find all my codes and data in the repository, too.

Lessons Learned

  • Data cleaning may be the most important and most time-consuming step when working with real data.

  • Algorithms that scale well are powerful weapons.

  • Beware of how the training set is constructed and choose the right validation/test approach.

  • In real-world scenarios, it makes much sense to simplify the trained model, sacrificing some perforamnce in exchange for better interpretability.

  • Color palatte and line opacity have huge impacts on our perceptions of visualization.

  • This task involves learning a classifier from only positive labels and unknown labels. It is called PU Learning and I plan to spend some time on researching it.