Adapting football to the child: an application of the logistic regression model in observational methodology

Logistic regression is included into the analysis techniques which are valid for observational methodology. However, its presence at the heart of this methodology, and more specifically in physical activity and sports studies, is scarce. With a view to highlighting the possibilities this technique offers within the scope of observational methodology applied to physical activity and sports, an application of the logistic regression model is presented. The model is applied in the context of an observational design which aims to determine, from the analysis of use of the playing area, which football discipline (7 a side football, 9 a side football or 11 a side football) is best adapted to the child’s possibilities. A multiple logistic regression model can provide an effective prognosis regarding the probability of a move being successful (reaching the opposing goal area) depending on the sector in which the move commenced and the football discipline which is being played.

relation between variables can be carried out (Silva and Barroso 2004) with estimative purposes-to what extent can the criterion variable be explained by the predictor variable(s), taking into account the presence of other factors-with predictive purposes-forecasting the value of the criterion variable depending on the different values which are adopted by the predictor variable(s). Anguera et al. (2011), present logistic regression as a possible analysis technique in observational methodology for the type of data corresponding to the four quadrants of the observational designs: sequential and event-base; concurrent and event-base; sequential and time-base; concurrent and time-base.
Despite this, the studies which apply this logistic regression technique at the centre of observational methodology are still scarce (Garganta 1997;Casal 2009). This work aims to emphasise the possibilities this technique has in the scope of observational methodology applied to physical activity and sports. More specifically, aiming to study which football discipline, 9 a side football (F-9) or 11 a side football (F-11), is better adapted to provide continuity to the child's practice after concluding the 11-12 year-old category, in which they practice the 7 a side football (F-7) discipline.
Starting from the importance given to the space factor in the development of the game (Castelo 2009), which seems to be key in the capacity to implement a technical-tactical task in an efficient and effective manner in Gréhaigne's proposals (1998Gréhaigne's proposals ( , 2001, elevating the space factor to the level of functional indicative of the child's level of expertise in the game; that is, the adaptation of football to the child.
In this research work we have resorted to a multiple logistic regression model to forecast the probability of a move being successful-reaching the opposing goal area-depending on the depth of the move-determined by the start sector-and by the discipline played.

Method
This research follows an observational methodology (Bakeman and Gottman 1986). The observational design we are going to use is, according to Anguera et al. (2001): point-three matches in each discipline (F-7, F-9 and F-11)-of monitoring between sessions-studying the behaviour in a continuous manner during the entire registry session, the matchnomothetic-three teams from the under-13's category-and multidimensional-levels of proxemic response (the spatial performance of the observed team while playing the game) and gestural (the way in which the actions that lead to the start and conclusion of the move are specified). The level of participation is that of non-participative observation; the observation carried out is active; and the degree of perceptivity is complete-direct observation.

Participants
The data which supports this work includes the total number of moves (n = 1, 142) played within a triangular tournament, in three football disciplines. The players which comprise the teams (boys who turn 12 years old during this year) have never taken part in federated competitions corresponding to the F-11 discipline-which is not applicable for them due to their age-or to the F-9 discipline-as this is an intermediate discipline which, on the day of the tournament, is not yet established-Each match lasted for 30 min. Table1 shows the measurements corresponding to the playing field in each of the disciplines played. Way in which the move starts Recovery (R); goal kick (GK); throw-in (TI); corner kick (CK); free kick (FK); penalty (P); kick-off (KO); drop-ball (BDB) 7 Way in which the move concludes Interception (I); out in penalty area (OP); out over the far line (OF); foul (F); out over a touchline (OT); goal (G); drop-ball (CDB)

Observational tool
An observational tool has been designed which allows analysis of the spatial path of the ball on the playing field. Table 2 includes a diagram description of the tool's criteria. Image 1 shows the zoning of the field used for the matches. Adding that in order to analyse the depth of the game, the different zones of the playing field have been merged cross ways, obtaining four sectors: safety, creation own area, creation opposing area and definition.

Recording tool
For recording and coding the data we have used the software programme SDIS-GSEQ (Bakeman and Quera 1995). More specifically, we have designed a template which has been adapted to the SDS encoding syntax (Bakeman and Quera 2011), including all of the constituting criteria of the observational tool.
In accordance with Anguera et al. (2011), and taking into account Bakeman's classical proposal (1978), this data is of type II, concurrent and event-base. On the other hand, if we take into account the type of data provided by Bakeman and Quera (1995), with regard to computerised encoding, the data are multi-event as they have been proposed in a multidimensional design and they use the combination of field format and categories system as an observational tool, which has actually been built based on these dimensions established in the observational design. Image 1 Zoning of the playing field 1.4 Quality of the data A procedure has been developed for training observers in accordance with the approach set forth in Anguera (2003). Two recordings have been carried out: (a) the first one has been done in agreement (Anguera 1990) by a group of three observers; (b) carried out by one single observer.
Using the SDIS-GSEQ software package, we have obtained a degree of reliability, in the shape of concordance, by means of Cohen's Kappa coefficient (1960) of: 0.84 for the F-7 discipline; 0.73 for the F-9 discipline; and 0.79 for the F-11 discipline. According to the reference values by Landis (1977), the agreement consideration is almost perfect, for F-7 and substantial, for F-9 and F-11.
Additionally, to guarantee the quality of the data, we have made use of the Generalizability theory (Cronbach et al. 1972). From Blanco et al. (2000) and Castellano et al. (2009), using the Generalizability Theory software (Ysewijn 1996), we have made two different designs at the centre of the Generalised Linear Model (GLM), of which we have selected the type II data as the data has not been gathered in a random manner. With regard to the design: categories and matches, C/M, the analysis of the generalizability coefficients determines the achievement of a generalizability precision reliability of e 2 = 0.99; while the categories and teams design, C/T, obtains e 2 = 0.97. These results allow us to assess the constancy of the matches played by the teams which have been taken into consideration within this research work.

Logistic regression
As the criterion variable is dichotomic (the move concludes or does not conclude in zone 80) and there being more than one predictor variable (sector in which the move starts and discipline), we have resorted to a multiple logistic regression model which is presented below: P(Y ), the probability of a specific event occurring; e, Euler's constant; X i (i = 1, 2, . . ., k), the predictor variables; β 0 , the model's constant; β i (i = 1, 2, . . ., k), the logistic regression coefficients. P(Y) = 1 1 + e −(β 0 +β 1 X 1 +β 2 X 2 +β 3 X 3 +···+β k X k ) Given that the predictor variables are two absolute variables with more than two categories (Sector: Safety, Creation Own Area, Creation Opposing Area and Definition; and Discipline: F-7, F-9 and F-11), have transformed into dichotomised dummy variables. The exponential of the coefficient corresponding to the dummy variables estimates the magnitude by which the risk of the event occurring varies, comparing this category with the reference category. For our logistic regression model, the Definition Sector has been used as a reference category in relation to the variable of the Sector in which the Move Starts; while, for the Discipline variable, the reference category has been the F-7 discipline.

Results
To determined the possible relation between the criterion variable (Success of the move) and the predictor variables (Sector in which the Move Starts and Disciplines), given that the variables are categorical (Nevill et al. 2002), we have resorted to using Pearson's chi-squared test (χ 2 ). The χ 2 obtained point out that the predictor variables Sector in which the Move Starts ( p ≤ 0.001) and Discipline ( p = 0.020), maintain a significant relation with the criterion variable: Success.
Subsequently, discarding the multicolinearity as of obtaining the contingency coefficient (0.122) between the predictor variables Sector in which the Move Starts and Discipline. Also discarding the confusion and interaction between variables.
With the aim to detect which variables may be included into the logistic regression model with a predictive purpose and estimate the degree of relation there is between the criterion variable and the predictor variables, the following procedures have been used in the analysis: introduce, forward and backward of estimation per maximum verisimilitude. In the three cases, the results obtained-the same-include the following variables in the logistic regression model: Safety Sector, Own Area Creation Sector, Opposing Area Creation Sector; and the F-11 discipline. Table 3 shows the estimated results for the coefficients of model-B and the Exp(B)-along with their significance levels and confidence interval.
In relation to its capacity for discrimination, the logistic regression model developed accurately forecasts 75 % of the analysed cases. Also obtaining a 10 % sensitivity-the model's probability of predicting whether the move is successful when it has actually been a successand a 96.2 % specificity-the model's probability for predicting that the move is unsuccessful when it has actually been unsuccessful.
In order to assess the goodness of fit of the logistic regression model (Hagquist and Stenbeck 1998), we have resorted to the goodness of fit contrast by Hosmer and Lemeshow (1989). The results obtained highlight the absence of significant differences ( p = 0.893) between the expected values and the observed values.
Next, we move on to the estimation process based on the logistic regression model we have built. If the sign of B is positive, this informs us that this particular variable favours the appearance of the event-in our case this favours the move concluding successfully-and we will obtain an Exp(B) higher than 1. In the event of the B sign being negative, we interpret  In relation to the Discipline variable, the probability of a move concluding in Zone 80 is 44.6 % lower in the F-11 discipline than in the F-7 discipline. There is no statistically significant relationship between the F-7 and F-9 disciplines.
Lastly, with regard to prediction, Table 4 shows the success probability of the move, depending on the different values adopted by the predictor variables.

Discussion and conclusions
A multiple logistic regression model has been built with a view to determine the probability of a move being successful (reaching the area which houses the opposing goal) depending on the sector in which the move commenced and the football discipline which is being played.
The results obtained conclude that the F-7 discipline is the one in which the player of the studied age (children turning 12 years-old during this year) shows the most spatial competition-capacity to generate progression in the game. This statement is in line with the studies by Carvalho and Pacheco (1990) and Pacheco (2007), for the age group comprised between 8 and 12 years; and by Costa and Garganta (1996), Ardá (1998), Ardá and Anguera (2000) and Vegas (2006), for the 11-12 year-olds who, without taking into account other football disciplines, compare between F-7 and F-11.
Additionally, the logistic regression model built indicates the increased quality of the F-9 discipline, in relation to the F-11 discipline. This improved adaptation of the F-9 discipline for the studied age coincides with the conclusions set forth by Arana et al. (2004) and Lapresa et al. (2006).
In conclusion, the fact that the maintenance of the F-7 discipline, as well as the introduction of the F-9 discipline, are better adapted proposals than the F-11 discipline, which is the discipline currently played by children in the 12-13 age group. This way, playing the F-7 and F-9 discipline, with a smaller playing field and less players than the F-11 discipline, must be seen as an opportunity to adapt the game to the child's characteristics and not as a distortion of the adults' sports game (Malina 2001;Federazione 2008).
With this work, we have managed to effectively apply logistic regression to observational methodology; highlighting the possibilities this analysis technique offers from the dichotomies which, at a sporting level, obtain special relevance, such as: success/failure; adaptation/non-adaptation; correct/incorrect; timely/untimely.