Recently I’ve finished the Capstone Project of Data Science at Scale Specialization provided by University of Washington. I’d like to share my solution here.
Work with real data collected in Detroit to help urban planners predict blight (the deterioration and decay of buildings and older areas of large cities, due to neglect, crime, or lack of economic support).
Data cleaning may be the most important and most time-consuming step when working with real data.
Algorithms that scale well are powerful weapons.
Beware of how the training set is constructed and choose the right validation/test approach.
In real-world scenarios, it makes much sense to simplify the trained model, sacrificing some perforamnce in exchange for better interpretability.
Color palatte and line opacity have huge impacts on our perceptions of visualization.
This task involves learning a classifier from only positive labels and unknown labels. It is called PU Learning and I plan to spend some time on researching it.