Saturday, September 13, 2014

Crime Different as Night and Day

Dallas Open Data has among others data set of Dallas Police reports with narratives from October 2013 to May 2014 totaling over half a million reports. By focusing on text fields with crime description and narrative from these reports I will plot several (not completely unexpected) text-based visualizations.

First approach associates each police report with one of logical text corpora to display wordclouds of most frequent terms. Rules defining text corpora may be based on time, location, gender, race, etc. To illustrate I define 4 logical documents - Night, Morning, Day, and Evening - by dividing 24-hour day into 4 segments: Night (reports with time of offense from midnight to 6 a.m.), Morning (time of offense from 6 a.m. to noon), Day (time of offense from noon to 6 p.m.), and Evening (time of offense from 6 p.m. to midnight). This results in 4 clouds displayed in pairs of opposing times: Night vs. Day and then Morning vs. Evening:

At night we are looking at ASSAULT, VEHICLE, and FLED, while day is about SUSPECT, THEFT, and NFI (No F...ing Idea). Not sure what CAUSING and CONSENT mean on top of the Night, but PAIN there makes sense. Evening and Morning seem to differ the most on ASSAULT and BURGLARY.

Another way to look at the same data is with slopegraph (example): terms (rows) moving up or down according to their frequency across 4 logical document (columns), but I am leaving this for my next post.

Unstructured data as text can still utilize types of plots usually associated with structured types. Bar graphs (histograms) by hour of day for reports with certain term is one way to achieve this. To illustrate let's count police reports with term BURGLARY for each hour (from 0 to 23) across all data (roughly 6 months). This results in the following plot:


Not exactly surprising but it offers several conclusions, for example: most burglar's work day starts at 9 in the morning until 10 in the eveining. They also have lunch break at about 3 p.m or possibly it's police officers who do. Finally, high peak at 5 is something everyone should be aware of unless, again, it's police officers who skew the time of offense in reports towards the end of their shift.

Different trend can be observed for reports containing BMV (Burglary Motor Vehicle):

At first, it's rather surprising to see the peak in the morning hours, since we expect this type of crime happening at night. But, likely, victims find out about this type of trouble only in the morning and police has no better time to report offense.

Last graph is for term ASSAULT:

This crime obviously gets reported by victim with good knowledge of when it took place. ASSAULT crime reports peak between 9 in the evening and 1 in the morning significantly cooling off by morning hours. I would like to see promoting of good night sleep as a crime prevention program some day.

Friday, August 8, 2014

The Good, the Bad and the Ugly: Where Russia Stands on Banned Foods

Russia responded to sanctions by banning western food imports. Supposedly the government hand picked products that should not immediately harm Russian economy and consumer.

Big question in Russia is deciding between production or replacing importers or both at least for one year while ban is active. Below I plot trends of Domestic Consumption, Production and Imports by Russia for last 20 years or so for some of the banned food groups. These are total numbers across all importers and my goal was to see when Russia can make leaps towards production and where it will likely look for new import partners. I also hope that economists may find this information helpful to assess impact on Russian consumer and economy.

The Good


Chicken, pork, and turkey meats are doing great. For all 3, growing demand is met with growing production and at expense of diminishing imports. Turkey industry in Russia is simply experiencing Renaissance - too bad it's relatively unpopular among consumers. All 3 are good choices to ban imports to further convince domestic producers to invest.

Fish and seafood demand is fully met by production, but picture is distorted due to variability of products inside this food group. Probably, imports of higher end products such as Norwegian salmon will not be easily replaced.
 
I suspect that Milk (non-dry) is not import heavy product so there is nothing to worry about (on both ends of the food ban vs. sanctions chain). We'll see quite different picture with dry milk later.

The Bad

Everyone in Russia talks about beef from Brazil and Argentina nowdays and this chart clearly shows why. Both import and production trend lower with decreasing demand with import not very far below production. Russia will be looking for replacement of banned beef elsewhere, probably in Latin America.




All 4 dairy food groups above show clear sign of replacing production with imports in last 10-20 years with 2 Dry Milk groups at worst. Import trends go up despite stagnating demand. Russian producers of milk products and retailers will be looking for new importers unless the government will reverse trend in falling domestic production.

The Ugly




Fresh fruits will come from Turkey or China or somewhere but not from Russia. The only group that shows production growth is grapes but it's clearly too low.

I picked pistachios because almost everyone loves them and they were on the list of banned foods. Good luck to find them in Russian stores in a few weeks and my sincere sorry to high-end chefs who will miss them from appetizers to desserts.

The Cherry Orchard 


Fresh Cherries (both sweet and sour) is a food group which associates with "The Cherry Orchard" - the last play by Anton Chekchov. You can read the play or look at the chart... and then read the play.

On more food groups and countries involved into Russia's Food Ban see my interactive infographics. Conclusions made are my subjective opinion. I welcome any comments and/or corrections.

Sources: Index Mundi  and United States Department of Agriculture.

Friday, March 28, 2014

Word clouds of Putin Address

Yet another turn of events took place today with Putin phoning Obama to seek diplomatic solution to the international standoff over Ukraine. Neither side expressed much excitement so far, but dialogue during crisis is better than couple of monologues.

Meanwhile what drove Putin to reach out to Obama? Maybe he feels it's the time he holds all the cards? While easily guessing his cards are Crimea, military buildup on the border, and continuing instability in Ukraine, what would be the bargaining about?

I will try using simple text analysis give another perspective on Putin's campaign in Crimea. Russian president doesn't give speeches or  press-conferences often but always exceptionally prepared. There were 3 relevant appearances by Putin in last couple of months before and during Ukrainian crisis (all are official translations from his site):

  1. News conference following EU Summit on January 24th.
  2. Press-conference with media representatives to answer questions with regard to the situation in Ukraine on March 4th. 
  3. Address by President to State Duma on Crimea on March 18th.
So what is the Address on Crimea about:
Not surprisingly it refers to Ukraine, Crimea and Russia the most. These words could be excluded without loosing any insight: 
Now, cloud becomes all about will and people (supposedly applied to RussianUkrainianCrimean). Has anything changed since EU Summit when Putin made his address? One way to answer this is to place both transcripts into the text corpus and run TF-IDF statistic on the terms. This time our cloud is based on the TF-IDF scores (minimal frequency of term per document is 3) for the address and will reflect both frequency in the Address and importance compared to EU Summit (that is all other documents in the corpus):
The words above stand out when compared to EU Summit text. It's no surprise that Sevastopol didn't sound in January, but nor were Kosovo, residents, and law. To make it more convincing let's throw into the mix Putin's press conference on March 4th when he broke silence on Crimea. Now the text corpus includes 3 documents and this is the cloud of the highest TF-IDF scores for the Address document:

Again there are Sevastopol and city, but also importantly Russians, NATO, millions, ethnic, reality, Tatars, borders, and USSR are the words that stand out compared to what Putin said before. It is a clearly a mix of his concerns, goals, and, well, realities, but, it could be also about symbol of Russian glory - Sevastopol - at least to some degree? After checking his speech it is clear that he referred to Sevastopol each time Crimea, but there was one place where this city mentioned alone: 
"I simply cannot imagine that we would travel to Sevastopol to visit NATO sailors."
Would Putin roll back and yield to international condemnation? Very unlikely, but I cannot imagine at all he will give Sevastopol back.

Since Sevastopol was used along with Crimea which was removed from analysis the cloud below is version of last with Sevastopol excluded. Word clouds are always open to interpretation so I leave it here for the reader to make their own conclusions:

Wednesday, February 26, 2014

Deconstructing The French Laundry Wine List, Part II

Having more refined data than last time I focus on prices in this post.

Price Word Clouds 

Word clouds below use prices instead of frequency: size corresponds to average price of bottles of wine each term belongs to (hence, even if expression occurs only once but in very expensive wine it appears on top):


Let's remember that Domaine de la Romanée-Conti is less expensive than Château Petrus. Next, let's zoom in by splitting this into two clouds: for red and for white wines (some names will disappear from both because they don't belong to neither reds nor whites, e.g. Scion which belongs to fortified wines).

Red wine price cloud (1287 bottles total):


Now both Domaine de la Romanée-Conti and Château Petrus are first among equals. The hint why the former improved lies in the white wine price cloud (539 bottles total):
The Burgundian estate is present here but not so for the one from Bordeaux. It'll be shown momentarily that the prices of whites are consistently below reds so averages tend lower when computed across the board. This effect is not present for Château Petrus as it doesn't feature whites at all.

Last cloud for today is whites price cloud without outlier Domaine de la Romanée-Conti. Removing it makes viewing prestigious whites on The French Laundry list almost as pleasant as drinking (just kidding):
    

Gender Wine Inequality between Reds and Whites

Are whites cheaper than reds? Using population pyramid type of histogram we can compare them by price (think of white as female, red as male (or vice versa if you wish), and wine price as salary). And just like in population pyramid we have plots for each country (France, Italy, and US):

US makes the best case for inequality while France fares best for equality (longer history of wine democracy?). All 3 show consistent trends though: red prices are right skewed with fat tails, while white prices are more symmetric with lower centers of distribution. Of course, all results are subject to The French Laundry sommelier's bias in wine selection (and possibly the reason that Spain was heavily under-represented in whites so it didn't make this chart).

Compare median prices (dashed horizontal lines) across 3 countries: contrary to popular belief American wines are better value than European counterparts (assuming that all wines on the The French Laundry list are outstanding). American wines really represent the "budget" section of the list (prices under $200) while Europeans peak above $200. I will follow up on that in the future posts.

Sunday, February 23, 2014

The French Laundry Wine List Deconstructed

Not that I surprise you but my little exercise in deconstructing The French Laundry wine list  may help with planning your trip there. This word cloud is from the wine list offered by the restaurant (available here):
It includes expressions parsed from the wine names. Terms that occur less than 5 times were excluded.

Wine names usually do not include varietal, country, appellation, and other standard designations. But some exceptions do occur, so below is the same cloud with top exceptions removed:
Apparently, Domain de la Romanee-Conti is a big winner for Head Sommelier Dennis Kelly. This estate in Burgundy produces some of the world's most expensive bottles of wine.

What vintages are in favor at The French Laundry today?
I did hear that 2010 was a great vintage in California. Is it really American wines that contribute to 2010 success? Next 6 vintage clouds are by country:
Indeed, 2010 is popular in USA today, but so is in France, Germany and Austria. Noticeably, sommelier favors 1996, 2004, 2006 in Italy, 1994, 2004, 2005 in Spain, 2005 and 2009 in France, 1996, 2004, 2006 in Italy.

And, finally, let's see wine price histogram:
Selection peaks around $150 to $200, with plenty of choices in $250 to $500 range still. In case you started to worry, I removed wines with price tag above $2000 to improve this chart: there are plenty to choose above $2000, especially the wines from Domaine de la Romanee-Conti.

P.S. Found nothing worth trying or prices are too high? Feel free to bring your own bottle of wine keeping in mind restaurant's published corkage fee policy: "Guests are welcome to bring wines that are not represented on our wine list; however there will be a fee of $150 for each 750ml bottle with a limit of one bottle for every two guests at the table."

Thursday, September 1, 2011

Good Luck, President Medvedev!

Finally President Medvedev said it. Wait is over. No more 2d term for him. No more questions - skip election press conference, relax, do crazy things, vacation all you want, rename Red Army to White, move permanently to Sochi or Miami. Nothing can hurt you anymore because president of Russia never says something like vodka is evil and wine is as good as water or juice.

There are way too many problems with this.

Firstly, beer is the main fighter of vodka in Russia. Never mind beer became an appetizer to a lot of vodka drinkers: it will be a hard sell for wine to overtake beer.

Secondly, dry wine demands some sophistication from its consumers because wine is a foodie drink. Anywhere you go wine goes side by side with local type of food: Chèvre and Sancerre or Schnitzel with sauerkraut and German Riesling or BBQ and Zinfandel. Pairing typical Russian food is a challenge (if you have a good idea for pelmeni with sour cream please let me know).

Next goes simple price/reward ratio: choosing between wine with 11-14% alcohol level and vodka with 40% alcohol level for half or less the price. It’s no brainer.

But don’t stop there. How about selection: there are literally 100s types of different wines – not even brands – which leaves uneducated consumer daunted and lost. On top of that two decent, affordable and well known sources of wine for Russian consumer are Moldova and Georgia – both are banned for import for pure political reasons.

With all these problems I wish President Medvedev good luck. Because some day wine might turn out to be a cure for vodka-loving Russia. But I don’t see just 2 things today: how Medvedev can be a president after this and how Putin can jump on a wine loving band-wagon when he becomes president again. Because even fantastically overinflated credit Putin has won’t be enough to fight vodka in Russia. But maybe, just maybe, it is the one idea that can turn a lot of things around in this country…

Tuesday, August 30, 2011

The chicken or the egg dilemma

It happens all the time. We procrastinate by ignoring it or effortlessly solve by not thinking about it. Higher education. Planning vacation. Dumping girl(boy)-friend. Changing jobs. Paying off credit cards. Each time there is a choice what decision comes first. Money for education or a degree to earn them. Plane tickets or hotel reservation. New girl(boy)-friend or breaking up. Job offer or letter of resignation. Buying more stuff or paying off a balance. And whatever choice we make will affect the other one that follows.

Here is another the chicken or the egg dilemma that people who like wines shouldn’t ignore: aging wines. It’s no secret that top 5-10% of (particular varietals of white and red) wines develop into superior versions of themselves with age. While young wine may exhibit some of its qualities the process of aging makes wine more complex and balanced, enhances its bouquet and lengthens its finish. Without climate-controlled cellar (the chicken) properly aged wine is unlikely to happen. And without experiencing the effect of aging (the egg) one is unlikely to commit to moderate to significant investment cellar demands.

Without a cellar you would
  • have to drink your wines almost immediately
  • never know how amazing your wine could have become after 2-5 years or more.

Let me guess that if you
  • are still reading this blog and
  • don’t have a cellar and
  • didn’t go to online shopping to buy one yet
then you are ignoring this dilemma.

How about solving it by having an egg without hatching it (it’s just fair because the egg was first indeed)? I can offer at least four ways in order from more to less expensive. And you are free to stick to them for as long as you can both afford and enjoy it or decide to start your own cellar.

The simplest way to try a good aged wine is spending extra $100-200 at the restaurant on older vintages of Pinot, Syrah, etc. (Cabernet will likely command larger premium). Share your plan with sommelier so that she can offer best choices and you won’t end up with overpriced label instead. Of course you would have to find a restaurant with respectable wine list but advantage is that such places usually offer excellent food as well. Remember that even latest vintage is always behind current year, e.g. today we are seeing new releases of 2007 Syrah and 2009 Pinot Noir. This means that you need to go back to 2003-2007 vintages or older.

Next option is library wine tasting at a winery or buying a bottle of library wine from them. Library wine tasting commands higher fee than regular tastings but way less than a bottle of wine at the restaurant. Make sure it is a library wine and not a premium or reserve wine. The latter are for current releases of select and limited wines produced at the winery. If you buy a bottle and have it shipped you don’t have to travel to a wine country near (or not so near) you.

Similar option is finding good aged wine at a wine store or a wine bar where you live. You’ll get a sound advice (I hope) that way too.

2000 Twomey Merlot
and 2005 Rubicon Estate Cask
can age for 10 years or more in cellar
And lastly, if you have a friend (or friend of a friend and so on) who has a cellar (even better winery) then go for it: tell him or her how interested you are in tasting aged wine before starting your own cellar. People don’t drink their wines alone – at least people keeping wine cellars. Just remember to bring nice food pairing when invited.

Chances are you will discover whole new world in aged wines  and 5 years will never taste the same.


P.S. No need in a big investment into wine cellar that occupies half of your place or make you move to a cave. Small kitchen appliance or built-in wine coolers will suffice. You can even rent a wine storage instead. It’s yet to be proven that 100% of people starting that way end up with wine cellars occupying better part of their house but I wouldn’t bet against it.