Sensitivity to temporal contingencies appears early in life and plays a key role in the ontogeny of socio-cognitive abilities in humans (Nadel et al., 1999; Gratier and Apter-Danon, 2009). The tendency for rhythmic coordination, sometimes referred to as “entrainment”, requires sensory-motor coupling (Phillips-Silver et al., 2010). In most of the fields of cognitive science, action-perception and agent-world coupling views are replacing the classical stimulus-response dichotomy (Novembre & Keller 2014; Silberstein & Chemero, 2012; Schilbach et al., 2013; Marsh et al., 2009). Such conceptual frameworks are well suited to study coordination phenomena as they emphasize the dynamical nature of cognition (Buzsáki & Draguhn, 2004; Lehmann & Schönwiesner, 2014; Varela et al., 1993; Kelso, 1995). Moreover, they leave room for the balance of autonomy, a central feature of complex biological systems, and interactive coupling, through which such systems relate to — and make sense of — their environment (Di Paolo, 2005; Barandiaran et al., 2009; Buhrmann et al. 2013). A naturalistic study of autonomy and coupling requires both embracing ecological situations and considering first-person perspective. Furthermore, many social coordination phenomena cannot be observed in the laboratory without the interaction of at least two subjects. We propose to consider linking first- and third-person measures, and even relate them across multiple interacting individuals. We will discuss how these concepts are intertwined in coordination phenomena, and outline existing methods to address those issues.