In stat_summary_hex, why do hexagons overlap if z is a factor?

+8 votes
asked Jun 28, 2013 by jflournoy

In the data set below, thing1 is numeric, and thing2 is a factor (but otherwise identical to thing1). For simplicity, the summary function is just the max value in the bin. When the z element is a factor, the hexagons overlap. Anyone know why?

DF=data.frame(xpos=rnorm(1000), ypos=rnorm(1000), thing1=rep(1:9,length.out=100), thing2=as.factor(rep(1:9,length.out=100)))
ggplot(DF, aes(x=xpos, y=ypos, z=thing1)) + stat_summary_hex(fun=function(x){x[which.max(x)]})
ggplot(DF, aes(x=xpos, y=ypos, z=thing2)) + stat_summary_hex(fun=function(x){x[which.max(x)]})


1 Answer

0 votes
answered Nov 29, 2018 by datanalytics-com

There are, to my knowledge, two functions in R to hexbin: hexBinning and geom_hex in the fMultivar and ggplot2 packages respectively. And both parametrize the centers of the hexagons according to the coordinates of hte lower-leftmost point in the sample.

It means that if you split your sample (according to a factor or, in my case, inside a mapreduce job) your hexagons become eccentric.

So I implemented my own hexbin function that assumes (0,0) as the center of the grid (i.e., if there were points around (0,0), the corresponding hexagon would be centered there) and requires just r (the radious of the hexagon) as parameter.

The implementation is here (sorry, text is in Spanish!). Moreover, my implementation has no explicit loops: it is fully vectorized.

Welcome to Q&A, where you can ask questions and receive answers from other members of the community.