Tuesday, January 15, 2013

Part 5 of Exploring Google Analytics data using Clojure, Incanter and MongoDB

Early Explorers used tools like the quadrant or a compass in their navigation. We use Clojure, Incanter and MongoDB to Explore Google Analytics Traffic Sources data. Today we build a simple interactive GUI application that works similar to the autofilter feature of any popular spreadsheet application.

Util.clj

We introduce a new function that requires a weight column produced by weighted-sort-row, and 2 metrics produced by classify-row. As described in Part 2 we use a simple classification method: below average:0, average:1, above average:2.
(defn view-traffic-sortable-sliders
  "View weighted-sorted table from col2 using SourceMedium names from col1 weighted-sort by wsortby, Categories for X, Categories for Y"
  [col1 col2 wsortby xcat ycat]
  (let [dat1 (fdb col2)
        ;; Human readable SourceMedium values                                                                                                                                    
        soum (fdr col1 :SourceMedium)
        ;; Merge Human readable col1 :SourceMedium column to col2 and sort by wsortby                                                                                            
        data ($order wsortby :asc (conj-cols dat1 (map (fn [y] (nth soum y)) ($ :SourceMedium dat1))))
        table (data-table data)]
    (view table)
    (sliders [slider_x (range 3)
              slider_y (range 3)]
             (set-data table ($where {xcat slider_x ycat slider_y} data)))))

Desktop.clj

We use :BounceRateW weights and 2 sliders :BounceRateC and :PagesVisitC. By changing the slider values from 0 to 2 the table shows only those traffic sources that have the same value as sliders show.
(view-traffic-sortable-sliders :m1 :m2 :BounceRateW :BounceRateC :PagesVisitC)
The screenshot above shows traffic sources that have below average Bounce Rate and above average Pages per Visit values weighted-sorted by ascending Bounce Rate. You may use view-traffic-sortable-sliders function for other metrics or using other weight.
(view-traffic-sortable-sliders :m1 :m2 :PagesVisitW :AvgVisitDurationC :NewVisitsC)
This example filters traffic sources table according to :AvgVisitDurationC and :NewVisitsC weighted-sorted by :PagesVisitW.

Distribution of clusters

A matrix view of the distribution of 2 clusters can be useful. In util.clj we introduce a new helper function:
(defn class-dist "Show distribution of clustered metrics" [x y]                                                                                                                  
  (matrix (vec (for [i (range 3)                                                                                                                                                 
                     j (range 3)]                                                                                                                                                
                 (nrow (fdb :m2 :where {x i y j})))) 3))
Let's see the distribution of :BounceRateC and :PagesVisitC.

(class-dist :BounceRateC :PagesVisitC)

;; Result matrix
[85.0000  4.0000  1.0000
37.0000 75.0000 16.0000
26.0000 67.0000 43.0000]
We may use this matrix similar to how early explorers used compass. This example shows that there are 85 worst traffic sources that have above average Bounce Rate and below average Pages per Visit (North West of the matrix). 43 in "South East" shows number of the best performing traffic sources. We have 75 average traffic sources in the center. Happy exploring of Google Analytics data using Clojure, Incanter and MongoDB!

Friday, January 4, 2013

Part 4 of Exploring Google Analytics data using Clojure, Incanter and MongoDB

Making data-driven decision on media selection helps increasing Return on Investment. Weighted sort feature of Google Analytics enables you to sort traffic sources while eliminating long-tail problem. In this post we will use an updated weighted sort function that produces the same sort order as Google Analytics' built-in feature.

Weighted Sort table

util.clj

There is a new wavg (weighted average) function and updated etv-ws and weighted-sort-row functions.
(defn wavg
  "Calculate weighted average of a row"
  [weights row]
  (cond (= (count row) 0) 0
        (= (count row) 1) (first row)
        :else  (/ (reduce +' (map * row weights))
                  (reduce +' weights))))

(defn etv-ws
  "Estimated weighted sort value for y based on x where mx maximum of Xs, ay weighted average of Ys."
  [x y mx ay]
  (let [etv ($= (ay + (x / mx * (y - ay))))
        ]
    etv))

(defn weighted-sort-row
  "Calculate and store weighted sort value in coll using wrow as weight.                                                                                                         
   Store new weigted values in rowW name as new column"
  [coll wrow row]

  (let [oids (fdr coll :_id)
        xdata (fdr coll wrow)
        ydata (fdr coll row)
        ay (wavg xdata ydata)
        mx (reduce max xdata)
        ]
    (pmap (fn [y]
            (let [oid y
                  rec (fetch-one coll :where {:_id oid})
                  x (wrow rec)
                  y (row rec)
                  w (etv-ws x y mx ay)
                  log (update! coll rec (merge rec
                                               {(keyword (str (name row) "W")) w}))
                  ]))
          oids)))


desktop.clj

Let's use weighted sort on BounceRate using Visits as weight. weighted-sort-row functions stores calculated weights in BounceRateW.
;; Calc weighted sort for BounceRate                                                                                                                                             
(weighted-sort-row :m2 :Visits :BounceRate)
;; Check collection                                                                                                                                                              
(fdb :m2 :limit 1)
;; Result                                                                                                                                                                        
[:BounceRateW :BounceRate :NewVisits :Visits :SourceMedium :AvgVisitDuration :PagesVisitC :PagesVisit :_id :AvgVisitDurationC :VisitsC :NewVisitsC :BounceRateC]
[0.39453393212385945 0.6667000000000001 0.3333 3 47 6 0 1.33 # 0 1 1 0]
;; new :BounceRateW appeared in collection

;; You can compare results with Google Analytics' weighted sort                                                                                                                  
(view ($order :BounceRateW :asc (fdb :m2 :only [:BounceRateW :SourceMedium :Visits])))
;; Name of each SourceMedium                                                                                                                                                     
(def SourceMedium (fdd :m1 :SourceMedium))
;; SourceMedium indexes sort by weighted BounceRate in ascending order                                                                                                           
(def SourceMediumBounceRateW ($ :SourceMedium ($order :BounceRateW :asc (fdb :m2 :only [:BounceRateW :SourceMedium ]))))
;; View sorted traffic sources in human readable form                                                                                                                            
(view (map (fn [y] (nth SourceMedium y)) SourceMediumBounceRateW))


You may use weighted sort for other metrics too. If you reallocate the daily budget of the worst performing traffic sources to the above average performing ones, you have a simple adaptive model for increasing the Return of Investment of marketing budget.