Where to Place the Next Bookstore?
“You have had a long and impressive career, but now you are ready to retire. You are interested in opening up a bookstore in New York.” Where should it be?
The idea is to find a neighborhood that has the demographics similar to the ones with already existed profitable bookstore.
A five-kilometer buffer around the bookstores was created and joined to NY stat census tract demographic data to be classified into census with/o bookstores for analysis.
When overlaying bookstores to NYC tracts we can immediately identify New York City, specifically Manhattan, as the densest area in the state. Thus, it might not be a good idea to start a bookstore business in the city. However, the data were analyzed regardless of these presumptions in R Studio.
The total number of NYC tracts is 4894 (9 tracts were lost in the joining process. This is negligible and should not affect the analysis). Although 39.79% of tracts with bookstores seems like a high percentage, the number can be misleading as tract areas vary drastically in one part of the state compared to others.
Bookstores in New York City
Zoomed out view of New York Stat
To explore a relationship between these variables and how does it change from one class to another we use a scatterplot matrix shown in Fig.3. The relationship between the variables have the same distribution in all tracts with or without bookstores except for the household and population the distribution seems to differ. This also supported by the histograms when I zoomed in to explore each variable separately.
Figure 3. Scatterplot matrix of the variable of interest. Red for tracts with bookstores. Blue for tracts without bookstore.
Five variables of interest (Fig.2) were chosen for exploratory analysis (population 2000, population 2009, median age, number of households, and employment).
Now let's see how the destitution changes in the following histograms.
Tracts with bookstores are denser with people in their late 30s. Form Fig.5 we see how the population distribution has changes from 2000 to 2009 in each class.The same analysis was done for households and employment shown in Fig.6 and Fig.7 respectively.
Overall, identify a certain pattern for areas with bookstores given these variables can be challenging. A simple statistical summary is not enough. However, based on this preliminary analysis I filtered the data to tracts that match the following criteria:
Median age (35 – 40)
Households (1000 – 1500)
Employed (1000 – 1500)
2009 population (4000 – 5000)
As a result, we get the highlighted tracts in Fig.8 which represent the best guess for a profitable business as they have similar median age, household, population and employment to those tracts with bookstores excluding the tracts in New York City.
As a final note, a better approach to identify neighborhoods with demographic characteristics similar to the ones with already existed profitable bookstores will be using a machine learning algorithm. This is a binary classification problem, where methods like Random Forrest or KNN will perform better than a guess based on summary statistic. Also, we should take into account the unique demographics of New York City in future analysis.
This exercise was part of GIS & Spatial Analysis Course by Michael Parrott in Columbia University.
Figure 8. New York state tracts recommended for profitable bookstore business highlighted in yellow