Fifth Through Eighth Grade Students’ Difficulties in Constructing Bar Graphs: Data Organization, Data Aggregation, and Integration of a Second Variable

Studies that consider the displays that students create to organize data are not common in the literature. This article compares fifth through eighth graders’ difficulties with the creation of bar graphs using either raw data (Study 1, n = 155) or a provided table (Study 2, n = 152). Data in Study 1 showed statistical differences for the type of data organization but not for grade level. Students’ primary problem was choosing a format that integrated a second variable and aggregating data. In contrast, in Study 2, we observed that seventh and eighth graders outperformed fifth and sixth graders. We interpret these results in terms of older students’ better data interpretation competence. We conclude that students’ difficulties in bar graphing can be traced to their tabulation processes. Data organization is essential for understanding and representing data, and educators should devote to it the attention it deserves.

avoidance of literality, and the removal of redundant information. Grouping data appropriately to create a graph was a source of difficulty prior to creating the bar graphs. In their analysis, the authors explained that children must do the following: . . . shift their attention beyond particular cases to consider aggregates (i.e., distributions) of objects, while continuing to maintain a sense of the relation between individual cases and the aggregate. A coordination that can be a quite conceptual challenge that ends up by developing attributes and conventions of data displays. (Lehrer & Schauble, 2000, p. 113) Similarly, Wu and Krajcik (2006) examined the progression of graphing competence when students were asked to create a graph using raw, self-collected data during a case study conducted in a science class that allowed students to utilize their own inscriptional practices. These authors also demonstrated that the greatest difficulty in constructing a bar graph lies in the organization of the data into a form that can be represented in the coordinate system and properly aggregated and organized. This organization requires (1) crossing of the variables and (2) aggregation of the resulting values after the variables have been crossed; these steps also correspond to the two basic steps of constructing a frequency table. The authors mention the crucial importance of supporting these particular steps in the graphing process. The studies of Lehrer and Schauble (2000) and Wu and Krajcik (2006) are thus relevant to ours because they describe the difficulties that students experience when organizing data for proper graph creation (in Kaput's model terminology, the construction of the referential relationship between data and its graphic representation). However, the reported studies did not test the hypothesized role of expertise in organizing data into tables when constructing bar graphs by means of an experimental design and the current study aims to do precisely this. In the next section, we present theoretical work on the role of tables in graphing.

TABLES AND THEIR ROLE IN GRAPHING
Bertin's matrix theory (2000/2001) presents a semiotic analysis of the properties of data tabulation and graphing and their role in describing relationships among data. He clearly established that tables are prerequisites for graphing. A table allows certain reordering and classification operations to be performed and a double-entry structure to be applied to the data; essentially, tables cross-classify items (Marti, 2008). In a double-entry structure, the values of one variable are shared with the values of another variable, and these values are expressed in the cells that are generated during the cross-tabulation (Duval, 2003;Novick & Hurley, 2001;Piaget & Inhelder, 1967;Tversky, Kugelmass, & Winter, 1991). These features are essential to the process of building a table and, consequently, the graphing process. Consistent with Bertin (2000Bertin ( /2001, Friel and colleagues (2001) made the following claim: Tables appear to be used in two ways. One way is as a type of data display. Tables may also be used for organizing information as an intermediate step to creating graphical representations. The graph maker may need to organize data in tables (e.g., frequency tables) before graphs can be made. (p. 127) Bertin's theoretical analysis (2000Bertin's theoretical analysis ( /2001 demonstrated that tables are essential for bar graphing. However, very few researchers have addressed the difficulties that emerge when students must organize data in tabular forms (Brizuela & Lara-Roth, 2002;Marti, 2008;Novick & Hurley, 2001;Marti, Garcia-Mila, Gabucio, & Konstantinidou, 2011). These authors have obtained results that indicate that table construction is a cognitively demanding process, especially for novices. On the other hand, there is some research in the field of mathematics education that shows that cognitive ability such as abstract-reasoning shows a significant relationship with graphing ability among seventh, ninth, and eleventh graders (Berg & Phillips, 1994;Dillashaw & Okey, 1980;Padilla, McKenzie, & Shaw, 1986). In contrast, Roth and McGinn (1997), studying graphing as practice, suggested the lack of competence being explained in terms of "experience and degree of participation rather than exclusively in terms of cognitive ability" (p. 92). This leads us to think that providing opportunities to iteratively transform particular representational displays into the other types of displays (lists, tables, graphs, etc.) (Duval, 1995) and progressively organize raw data graphically (English, 2012) may elicit students' competence.
The aims of the present research are pursued by addressing three research goals: 1. Trace the main difficulties faced by middle school students in creating bar graphs to the students' level of expertise in formatting data into tables (Study 1). 2. Analyze how these difficulties change across grade levels (Study 1). We chose to compare upper primary school students (fifth and sixth grades) with lower secondary school students (seventh and eighth grades) with the assumption that the change in the difficulties in graphing performance might be explained by general expertise in data manipulation and data formatting acquired through general instruction. 3. Compare the prior graphing difficulties with those encountered by the students when they are provided with data in table form (Study 2).

STUDY 1. THE CONSTRUCTION OF A GRAPH BASED ON THE CONSTRUCTION OF A TABLE
As cognitive tools, tables assist us in organizing and grouping data effectively to ensure that the data are better understood and represented. By asking students to construct a table using a list of data that they will subsequently use to create a bar graph, we were able to analyze the relationship between the types of tables constructed and the resulting graphs.

Design and Participants
A cross-sectional design including one between-subject factor was used in the analysis. The dependent variable was the students' performance in constructing a graph. The factor was the student's educational level, which was one of two levels: (1) primary education (fifth-and sixthgrade students) and (2) secondary education (seventh-and eighth-grade students). We chose to compare upper primary school students with lower secondary school students because according to general patterns in educational systems (Watson & Fitzallen, 2010), the transfer from primary to secondary education implies the implementation of curricular changes in the manner in which tables and graphing are addressed. At both levels, a basic competence is the management of various forms of data organization, although the perspective differs. In the Spanish educational system, in primary education, this general aim appears to be broadly explicated in the mathematics curriculum, specifically in the area of "managing information: probability and randomness." In contrast, in secondary education, two blocks of mathematics content ("functions and graphs" and "statistics and probability") clearly refer to activities concerning the interpretation (but not the construction) of tables and bar graphs. 1 Beyond general education regulation, to know more about whether and how the participants' teachers in the schools taught tables and graphs content, we gave them a written questionnaire with the following questions: (1) Have you explicitly taught your students how to construct a double entry table/a bar graph? (2) Before we handed out the task to your students did you work on any activity that specifically referred to learning how to construct bar graphs/tables? (3) How much time did you approximately spend teaching these activities? (4) Did you hand out activities that involved the use of tables or graphs already constructed? (5) Do you use the textbook as a reference for teaching content? What textbook do you use? Do you use complementary material? The questionnaire was answered by 11 teachers (five primary and six secondary). We conclude that most of the teachers addressed the teaching of tables through examples and exercises for which the table was a means to reach another goal. None of them explicitly explained how to construct tables. Answers for graphs were similar to tables. The time devoted to activities related to both graphs and tables ranged in total from three to six class sessions. Finally, all teachers relied on their textbooks to draw examples and exercises from, and two of them mentioned the use of complementary material mostly taken from newspapers (one primary school and one secondary school). We reviewed the textbooks used and they all included tables and graphs but we did not observe any explicit teaching on how to construct or interpret them. They were treated as complementary data or as exercises whose solution required the construction or the interpretation of a table or a graph.
The study sample included 155 students drawn from the five public schools located in the same middle class neighborhood in Barcelona (Spain). Students were located at one of two educational levels: there were 71 primary school students (mean age = 10.8, range = 10.0-11.2) and 84 secondary school students (mean age = 13.7, range = 12.3-14.4). Since the task was presented as a regular activity related to the curriculum, all students in the mathematics class participated in the study.

Procedure and Materials
The data were gathered in a classroom setting. Students worked individually. The time devoted to the task was one class session (55 minutes) during the second semester of the school year. They were provided with an answer booklet that outlined the specific instructions (see next), along with a general explanation to contextualize the task. The answer booklet consisted of three pages containing descriptive headings: a page for the table, a page for the graph, and a third page to be used as scratch paper. The data sheet with a list of 25 names, the ordinal number of students in the class, and the ages and heights of these students was added as a separate sheet of paper (see the list in Appendix A). The content of the data used in the assignment was sufficiently simple to limit the confounding effect of students' lack of understanding of the data on their ability to plot them (Gerber, Boulton-Lewis, & Bruce, 1995). Students were told that they could ask the experimenter any question they might have on the material. Their own teachers were in the class with the experimenter. The specific instructions in the material provided were as follows: Students from another school have gathered the following data (see the list). With these data, you should (1) create a TABLE and (2) create a GRAPH to show how many boys and how many girls are shorter than 130 cm, how many are between 130 and 149 cm, how many are between 150 and 169 cm, and how many are taller than 169 cm.

Scoring
According to several authors (Friel et al., 2001;Klass, 2012), the most appropriate format for the type of data (a list of individuals who indicated gender and height) and the type of demand (how many boys and girls are taller than . . .) was the bar graph. According to Klass (2012), pie charts are used to represent the distribution of the categorical components of a single variable. As a general rule, pie charts should be used rarely, if at all, for comparing two or more variables. Given that each variable is represented by a pie, the reader is forced to draw comparisons across the two pie charts, and it is more difficult for the eye to discern the relative size of pie slices than it is to assess relative bar length.
Graphing performance was coded according to the criteria that were determined to be relevant in the literature review. We integrated Kosslyn's (2006) proposal that there are three main components of graphing (framework, specifier, and labels) with the definition of bar graphs by Friel and colleagues (2001) and the analysis of Cartesian graphs by Leinhardt and associates (1990). The first criterion, format, corresponds to Kosslyn's framework concept (whether students represented height and frequencies on the y-and x-axes, respectively). The second and third criteria (gender differentiation and gender integration, respectively) are based on Kosslyn's concept of the specifier. In addition, the analysis of bar graphs by Friel and associates (2001) suggests using bar graphs to represent the relationship between two categorical variables. Again, the analyses of Cartesian graphs by Kosslyn (2006) and Leinhardt and others (1990) inspired the fourth criterion (labeling the second variable with the legend) and the seventh and eighth criteria for axes (x and y) labeling. Finally, the fifth (coherence between what the participants represented and the labels that they used) and the sixth criteria (the scaling of the measures on the axes) were based on the need to measure the relationship between the components described previously (see the detailed coding rubrics for the bar graph in Appendix B).

Results and Discussion
Because participants were not specifically asked to construct a bar graph (see the previous task), the students could choose the format that they believed was the best way to represent the data. This situation allows us to analyze the degree to which the students understood the correspondence between the nature of the data and the method of representing these data (Klass, 2012).
Thus, prior to analyzing the students' specific difficulties with bar graphs, we examined the formats that the students selected to represent data involving frequencies that cross two variables. As indicated in Table 1, half the students chose to create a bar graph (50.7% of primary school students and 51.2% of secondary school students). The distribution of the other half of the students was as follows. Among the primary school students, 25.3% (18 of 71) submitted blank answer sheets and 23.9% (17 of 71) chose a format other than a bar graph. Among the secondary school students, 40.4% (34 of 71) submitted blank answer sheets, and 8.3% (7 of 84) chose a format other than the bar graph. We will return to the students who left the answer sheet blank in the next section. The formats chosen were pie charts (6 primary students and 4 secondary students), tables (4 primary students and 2 secondary students), and lists, text, or drawings (7 primary students and 1 secondary student).
To examine the first research objective, which is the analysis of the students' difficulties in constructing bar graphs, we included in the analysis only the students who constructed a Cartesian graph in applying the scoring rubrics. Thus, the final sample consisted of 79 students (36 primary school students and 43 secondary school students; see Table 1). All of the student productions were double coded, and the inter-coder reliability was 90%. Discrepancies were resolved through discussion.
For the present task, the specific goals were to create (1) a TABLE and (2) a GRAPH to show how many boys and how many girls are shorter than. . . . Thus, an understanding of the goal implies a solid understanding of the data according to the two variables gender and height intervals and the ability to choose the framework and the specifier for easily answering the question.
By examining the frequencies with which the students' graphs satisfied each of the listed criteria we can identify the most common difficulties that the students encountered in creating bar graphs. The results in Table 2 indicate the frequency distribution and the statistical comparisons of the two educational levels. We found that none of the frequency comparisons yielded significant differences between the two groups (primary and secondary students). We also observed that gender integration and legend were the least satisfied criteria (less than 40% in both groups). These criteria were followed by the format, gender differentiation, and scaling criteria, which were satisfied in approximately 70% of the graphs. Finally, the criterion of coherence between the representation and the labeling of the two axes was satisfied by approximately 90% of the students (see Table 2).
These results provide an answer to the differences by grade level. Nearly all students at both educational levels correctly labeled the axes on their graphs, and more than three-quarters of the students designed graphs with the correct scaling and represented the data coherently in terms of labeling. Unexpectedly, we did not observe significant differences between educational levels for any criteria, regardless of the difficulty associated. We found that general education did not appear to affect the students' competency in graphing.

Analysis of Graphs Created According to the Tables Constructed
To explain this lack of a difference between educational levels, we hypothesize that the ability to format the data properly prior to the creation of a bar graph might be a stronger factor that interfered with the potential effect of general education. Thus we argue that the students who understood and formatted the data correctly in table form exhibited no difficulties with the representation of data in bar graphs.
The design of the study allowed us to trace the students' graphing competence to the nature of the tables that they created prior to making their graphs. To this end, this section defines the types of tables used. Based on prior theoretical analyses (Duval, 2003;Novick & Hurley, 2001;Tversky et al., 1991), we categorized the tables produced by the students according to whether they showed a double-entry structure with frequencies (for a thorough analysis of students' tables, see Marti et al., 2011). Tables were coded as (1) the double-entry structure with frequencies when they showed one of the variables displayed along the horizontal dimension and the other in the vertical dimension, and (2) with frequencies if the cells were filled with the frequencies corresponding to the values resulting from the cross tabulation (see Figure 1). For the sake of simplicity from now on we will refer to these tables with the label of canonical and those that do not satisfy either one or both conditions, will be labeled as noncanonical. Examples of noncanonical tables that do not satisfy the first condition include single or double lists that are either unorganized or organized according to height, gender ("nens" for boys and "nenes" for girls), or both and single lists organized according to both height and gender but lacking frequencies. Other examples of noncanonical tables that do not satisfy the second condition include double-entry cell structures that contain the names or heights of students or that simply display crossed cells rather than displaying frequencies [see Figure 2(a)] 2 .
It was hypothesized that canonical tables would enable students to organize data effectively and allow for the easy transfer of the data into a graph. Of the 36 primary school students, 18 created noncanonical tables and 18 created canonical tables. Among the 43 secondary school students, 29 created noncanonical tables and 14 created canonical tables. Here, we want to add a note regarding those students who might partially understand the data but do not succeed in organizing them according to conventional practices. In order to consider the students with partial understanding, we include a qualitative analysis that looks at two subcategories within the noncanonical group: those whose tables only satisfied the first condition (double entry structure but no frequencies), and those whose tables only satisfied the second condition (indicated frequencies but did not organize the data in a double entry display), and we further look at the type of graph they created. The graphs of the first subgroup were 19 cases (see Figure 2a and 4a for examples; the heading of the table in Figure 4a shows Catalan labels: "nom" for name, "cognom" for last name, "sexe" for sex, and "mida" for height measure) in which students wrote all the names of the cases for each height interval in the cell that resulted from the cross tabulation of data; 8 cases who left the answer sheet blank; 6 cases had a format that did not respond to the task; and 5 had a format with frequencies and the variable gender integrated in the bar graph, for which we assume they had to count the cases in their table cells. The graphs of the second subgroup, those who made a table indicating frequencies without a double entry structure, were less frequent. There were six participants, two of them did not make any graph, two of them made a graph without indicating only heights but no gender, and finally, two of them made two separate bar graphs one for each gender. From these results we can conclude that the understanding of the double entry structure is essential for bar graphing far more than data aggregation into frequencies. In the former case, students can count the cases in the cells and represent them accordingly, while the reverse does not apply: displays with frequencies without the double entry structure shows a much poorer understanding.
Finally, we want to add a comment on the blank sheet answers. The analysis of the tables that the students constructed for later data plotting allowed us to trace the types of tables that the students who left their answer sheets blank constructed. With the exception of two students, all of the primary school students who left the answer sheet blank departed from a noncanonical table. Similarly, of the 34 secondary students who left the answer sheet blank, 29 departed from a noncanonical table and only 5 constructed a canonical one.
In the next analysis, we examine the percentages of student graphs that satisfied the criteria according to the type of table constructed. The frequencies and percentages are presented in Table 3 and Figure 3. Table 4 shows the statistical analyses comparing the frequencies (and percentages) between educational levels for each table-group (canonical and noncanonical) and Table 5 shows the statistical analysis comparing the frequencies (and percentages) according to    Format We observe in Figure 3 that the students who experienced the greatest difficulties were those who created a noncanonical table prior to creating their graphs. Although, as shown in Table 4 and Figure 3, none of the criteria comparisons between educational levels for each tablegroup (noncanonical and canonical) was statistically significant, when we compared frequencies according to the type of table constructed (Table 5), we observed that in contrast to those who Format constructed a noncanonical table, the corresponding frequencies of those who created canonical tables increased significantly for both educational levels: format [primary education: χ 2 (1) = 18, p = 0.001 and secondary education: χ 2 (1) = 11.1, p = 0.001]; gender differentiation [primary education: χ 2 (1) = 12, p = 0.001 and secondary education: χ 2 (1) = 12.9, p = 0.001]; gender integration [primary education: χ 2 (1) = 7.5, p = 0.006 and secondary education: χ 2 (1) = 6.5, p = 0.011]; legend [primary education: χ 2 (1) = 16.8, p = 0.001 and secondary education: χ 2 (1) = 6.0, p = 0.014]; and scaling [primary education: χ2 (1) = 6.4, p = 0.01 and secondary education: χ 2 (1) = 4.7, p = 0.031]. In contrast, the coherence criterion showed significant differences only for primary education due to the fact that the canonical group showed a percentage of 100% (18/18), χ 2 (1) = 4.8, p = 0.030. Finally, the type of table constructed did not show any effect on axes labeling, percentages were more than 90% in both groups. The frequencies of students' graphs in the noncanonical group that satisfied each particular criterion were low. The lowest frequencies were for gender integration and legend. For gender integration, the frequency was 3 of 18 (17%) for primary school students and 4 of 29 (14%) for secondary school students and, for legend, the frequency was 1 of 18 (5%) for primary school students and 3 of 29 (10%) for secondary school students. In contrast, the frequencies for format, gender differentiation, and scaling were slightly higher; they were 6 of 18 (33%) for format, 7 of 18 (39%) for gender differentiation, and 9 of 18 (50%) in primary education students. Similarly, for secondary education, the frequency for format was 14 of 29 (48%), for gender differentiation was 10 of 29 (34%), and that for scaling was 15 of 29 (52%) (See Table 3 and Figure 3).
We find a great contrast when we compare these frequencies with the frequencies of those who created canonical tables. The frequencies for format reached 100% at both educational levels (18 of 18 for primary school students and 14 of 14 for secondary school students), and they were higher than 90% for gender differentiation (17 of 18 for primary school students and 13 of 14 for secondary school students). Additionally, following the same difficulty pattern as in the noncanonical group, frequencies for gender integration and legend were slightly lower. For the former, the frequency was 11 of 18 (61%) for primary school students and 7 of 14 (50%) for secondary school students, the frequency for legend was 13 of 18 (71%) for primary school students and 6 of 14 (43%) for secondary school students. Finally, the frequency for scaling was a little higher, 16 of 18 (89%) for primary students and 12 of 14 (86%) for secondary students. Therefore, we again confirm that difficulty is associated with the integration of the second variable in the graph (see Table 3), illustrated by the difficulty with the criterion gender integration and, consequently, the legend criterion. Next, we illustrate these figures using some of the students' graphs and tables.

Qualitative Analysis of the Students' Graphs and Tables
Figures 2(a) and 2(b) show the table and graph, respectively, constructed by Marta (a secondary school student). Marta's table contains the double-entry structure, but it does not display frequencies and was thus coded as noncanonical. The table has one row for each name on the data list, with the names being structured by gender, and the columns include the four height intervals. This organization is a common practice that demonstrates students' resistance to the avoidance of literality and progression toward abstraction (Lehrer & Schauble, 2007): the graph demonstrates a lack of abstraction that prevents the use of frequencies; it only superficially resembles a bar graph and shows no data aggregation. Instead, the x-axis contains all of the names of the students on the data list structured into two gender categories, and the y-axis contains the four height intervals. Gender is differentiated but not integrated; thus, there is no need for a legend.
Similarly, Xenia's table and graph are shown in Figures 4(a) and 4(b), respectively. Xenia's table is a double list of names structured by height, and gender is marked as an added label in the third column. The higher literality of this table deviates from the double-entry structure.
The students know how tables should appear and create formats that resemble them, but they fail to use the double entry and frequencies. Xenia's and Marta's tables demonstrate the same problem (the lack of data aggregation into frequencies) in different manners. Xenia's graph is structured according to the names of the students on the data list (x-axis); thus, her graph includes no data aggregation and, subsequently, includes no frequencies. Consistent with their respective tables, Xenia's graph is slightly more literal than Marta's because Xenia's graph indicates specific heights for each name and resembles a scatter plot.
As explained in the discussion of coding criteria, the graph format must include frequencies on either the y-axis or the x-axis and height intervals on the x-axis or the y-axis. As indicated in Figures 2(b) and 4(b), a common error among students who constructed graphs using noncanonical tables was the representation of the heights of the students on the y-axis by drawing one bar for each student and placing the names of the students on the x-axis, defined as the case-value plots, precursors of bar-graphs that require neither data aggregation nor abstraction into frequencies (Watson & Fitzallen, 2010). Eight primary and 11 secondary school students drew graphs using this structure; all of these graphs belonged to the noncanonical group. Thus, students demonstrated problems with the representation of frequencies on the y-axis and exhibited a bias for concreteness by associating the variable of height on the y-axis with the name of a student on the x-axis.
Difficulties differentiating gender were observed in the graphs created by students who began with a noncanonical table. We observed different degrees of gender differentiation. For instance, the graphs in Figure 2(b) and Figure 5 clearly differentiate gender; however, they do not represent the integration of gender and height. In both figures, the x-axis provides the names of the students, and the y-axis shows the height intervals. Figure 5 presents the heights for each student and labels the y-axis using height intervals, whereas Figure 2(b) presents the students at each height interval, which is closer to satisfying the format criterion.
These results provide answers to the two research questions that were posed in this article. Concerning the first question, our data confirm the hypothesis that the primary difficulties encountered by the students in creating bar graphs were related to difficulties in organizing the data in double-entry tables including frequencies. With regard to the second research question, we have observed the lack of differences among middle school educational levels in constructing bar graphs. The nature of the tables that were constructed (canonical or noncanonical) was a much stronger factor than educational level and led to graphs that differed clearly in quality. However, although the students' graphs may have satisfied the format criterion, even when they chose not to create a canonical table, gender integration and legend appeared to be difficult for many students.
Do these results confirm that students' difficulties lie in the process of constructing a table? Would the students' difficulties in creating bar graphs change if they were provided with data that were already in table form? In the next section, we present a study in which we analyze whether the provision of data that are already in table form affects students' graphing performance.

STUDY 2. THE CONSTRUCTION OF A GRAPH BASED ON DATA PROVIDED IN A TABLE
The third research question addresses the issue of whether the difficulties faced by middle school students in creating bar graphs change when these students are provided with data that are already in on studies conducted in the fields of language development (Clark, 1993) or literacy (Ferreiro & Teberosky, 1983), which indicate that production is more difficult than interpretation

Design and Participants
The design for this study was identical to that of Study 1. The dependent variable was the performance demonstrated by students in constructing bar graphs, but in this study, the data were already organized in a table that was provided to the students. A total of 152 students were selected from the same schools used in Study 1. Students were also drawn from two educational levels: 68 primary school students (fifth and sixth grades; mean age = 10.9, range = 10.1-11.4) and 84 secondary school students (seventh and eighth grades; mean age = 13.6, range = 12.5-14.3). Similarly to Study 1, of these 152 students, the graphs produced by 23 students (14 primary school students and nine secondary school students) were not included in the analysis because they were in the form of a pie chart, a table, or text. Additionally, five students in the primary education group and two in the secondary education left the answer sheet blank. Therefore, data obtained from a total of 122 students were included in the analysis. The final sample consisted of 49 primary school students and 73 secondary school students.

Procedure and Materials
The data were gathered in the classroom. We provided the students with an answer booklet outlining the specific instructions (see Table 6): 4

Results and Discussion
As in Study 1, we compared the frequencies with which the criteria were satisfied by the graphs of the primary and secondary school students. In contrast to the results of Study 1, we observed that the graphs of the secondary school students outperformed those of the primary school students in terms of format [χ 2 (1) = 6.2, p = 0.014], gender integration [χ2 (1) = 4.9, p = 0.027], legend [χ 2 (1) = 4.7, p = 0.029] and coherence [χ2 (1) = 9.2, p = 0.002]. Almost all of the secondary school students made a bar graph correctly in terms of format (97%), gender differentiation (92%), and coherence (94%). Only half of them (53%) integrated gender and only one third (33%) provided a legend. In contrast, 88% of primary school students created a graph that satisfied the format criterion, only 33% integrated gender, very few (16%) used a legend, and 73% represented and labeled coherently. It is interesting to note that almost all differentiated gender (94%) (see frequencies and percentages in Table 7).
This lack of differences between educational levels regarding the gender differentiation criterion is easily explained by the provided table's explicit indication of gender. Consequently, both primary and secondary students were able to use this information to represent the data in the bar graph. However in contrast with secondary students, primary students did not appear to be able to use other information implicit in the table, such as cross-tabulated frequencies, resulting in significant differences in the chi-square analysis of format, gender integration, and legend criteria. Thus, we could hypothesize that the secondary school students' expertise in interpreting tables was higher.  Because the present study is part of a larger project that focuses on both graph and table construction and interpretation, this hypothesis can be confirmed by means of another set of data from the wider project. These data on graph interpretation were gathered simultaneously with the data on graph construction by randomly splitting the classes into three subgroups: the first group of participants was assigned to Study 1, the second to Study 2, and finally, the third to a table interpretation task. Thus, with this extra set of data (Gabucio, Marti, Enfedaque, Gilabert, & Konstantinidou, 2010), we were able to respond to the issue of whether secondary school students had greater expertise in interpreting tables.
Two hundred students were given a test in which they were asked to interpret a table that was very similar to the table used in Study 2, where heights were replaced by weight and the totals were reduced to 50, keeping the rest identical. We found significant differences between the test means of students at the two educational levels; younger students demonstrated clear difficulties with the items in the test that pertained to the double-entry structure. For instance, younger compared to older students had difficulties with questions such as The numbers that appear inside the cells correspond to: (a) weights; (b) number of people; (c) age; and (d) height (see Appendix in Gabucio et al., 2010, for the interpretation test).

Qualitative Analysis on the Students' Graphs From a Provided Table
In the following paragraphs, we qualitatively illustrate these results using examples. Jon ( Figure 6) drew eight bars, one for each interval value and gender. He marked the values of the interval limits on the y-axis (0,130,(149)(150)169). He did not label the x-axis, but he indicated gender ("nens" for boys or "nenes" for girls) inside the bar at each height interval. The sizes of the bars appear to correspond to the height interval limits rather than to the frequencies in the table or to the students' heights (as in Figure 4b or Figure 5), although there are three bars for girls in height 149, but none in height 130.
Pat's graph (Figure 7) is interesting because it illustrates a secondary school student's understanding of the frequencies in the table that resulted in an unsuccessful attempt to represent those frequencies in the graph. The graph shows heights on the y-axis and frequencies on the x-axis. The values of the y-axis match the size of the bars, which represent approximate values for height intervals; for instance, to represent the interval "more than 169 cm," Pat marked 170 in the y-axis.
Regarding the x-axis, she split it into two halves, the left one for boys and the right one for girls, and scaled each half from 0 to 50 (the maximum frequency in the provided table is 51). In addition, she placed the bars in the corresponding x-value (except for the last column, that should be in x = 6), and she decided to write the frequencies with the specific gender inside the bars (the first bar reads "23 boys of more than 169" at the y-value of 170; the second, "40 boys between 130 and 149" at the y-value of 135; the third bar repeats "23 boys of more than 169" at the yvalue of 150; the fourth, "16 girls" at the y-value of 30; the fifth, "20" at the y-value of 5; the sixth includes an error in the interval limits, "51 girls from 159 to 159" at the y-value of 157; and the eighth bar reads 6 girls of more than 169" at the y-value of 170). We observe several inconsistencies in Pat's bar graph. She missed representing frequencies in the height interval of 150-169 cm for boys and girls respectively. She also missed representing the value of frequency "0" for boys smaller than 130 cm. In contrast, Pat repeated the data in bar 1 and bar 3. She misplaced the value of 6 for the frequency of girls "taller than 169 cm." She indicated a frequency of 16 at the y-value of 30 instead of 130." And finally, there is a value of 20 in the fifth bar that  was difficult to interpret. These results reveal Pat's struggle with the graphical representation of the data provided in the table despite her signs of proper table interpretation. Finally Ferran's graph (Figure 8) indicates clear gender integration, with a bar graph that appears correct. However, when we examine the axes, although the columns represent the values of the frequencies of the table, the y-axis is labeled "height in cm" and the x-axis is labeled "students," segmenting the axis with the correct height intervals. Thus, the table exhibits a clear incoherence between what is represented and what is labeled. This graph is in contrast to Figures 2(b) and 5 in Study 1 in terms of the attempts to integrate gender into the graphs. Figures 7  and 8 indicate that although secondary school students have a better understanding of tables than primary school students, even the graphs of secondary students are not straightforward. The following examples show that although the students may understand tables, their graphs still indicate difficulties regarding the representation of gender integration and the legend.

Study 1 and Study 2 Comparison
In addition to the differences between the primary and secondary school students in graph performance based on the provided table, we believe that it is worthwhile to compare the results in Study 1 with the results in Study 2. Our final analysis compares the percentages of students' graphs that satisfied the criteria when the graphs were created using their own tables or using a table provided. Although the two task prompts were not identical, given that students in each classroom were randomly split into two groups and assigned to simultaneously perform the task for each study, the comparison is reasonable. We did not analyze the data with a between-group design with one condition (to make a table or read a table) because as previously mentioned, the task prompts were necessarily different to address the task's goal.
In primary education, the comparison of graphing performance (percentage of criteria satisfied) yielded significant differences between the graphs that were created based on a constructed table and the graphs that were created based on a provided table for the criteria gender differentiation and legend. Gender differentiation was better satisfied by graphs from a provided table and  TABLE 8 Chi-squared analysis of the association between the criteria of constructed graphs according condition for each educational level.

Chi-square analysis
Primary education Secondary education legend was better satisfied by graphs from one's own table (see the respective data for graphs that were based on a constructed table in Table 2 and based on a provided table in Table 7 and see statistical results in Table 8; for comparison, Appendix C presents data from Table 2 and Table 7 in a single Figure).
In secondary education, all frequencies were higher for the graphs made from a provided table for format, gender differentiation, gender integration, and scaling. As if having the table and being able to interpret it thoroughly had helped the students construct a better graph (statistical analysis are provided in Table 8). Format, gender differentiation, gender integration, and legend are criteria closely related to the understanding of the structure of the data, and its consequent representational management. The first three seemed to be influenced by the understanding of the data prior organization in a provided table, whereas the legend was difficult even with data already organized in a table. Nonetheless we are aware of the fact that differences for gender differentiation could be explained by a slight ambiguity in the wording of the task prompt in the make-table condition. If it had explicitly repeated, "How many boys are shorter than 130 cm and how many girls are shorter than 130 cm" instead of "How many boys and how many girls are shorter than 130 cm," frequencies for gender differentiation in the make-table condition might have been higher.
Contrary to the belief that tables that are similar to the table presented in our study are straightforward, students' lack of understanding of such tables may have implications for their ability to graph depicted data. In contrast to the expertise of secondary school students, the lower expertise of primary school students is hypothesized to be the cause of their low graphing performance when they were asked to create a graph using a provided table.

GENERAL DISCUSSION
Although the construction and comprehension of tables have been neglected topics in the literature, the results of this research suggest that these topics deserve further study in the field of learning and instruction. As Ainley (2000) stated, the widespread use of graphs in advertising and in the news assumes that graphs clearly communicate their meaning, but this assumption conflicts with the results of the research on pupils' difficulties with graphing in mathematics and science. In response to the first goal of the study, our data provide evidence that students have significant difficulties with bar graphs. In analyzing how students construct a table before constructing a bar graph, we have shown that their main difficulties are related to structuring the data from two variables and understanding that a set of items can satisfy two criteria after the variables have been integrated and the numbers have been aggregated accordingly. We think that these difficulties indicate that students, when constructing a graph, have to engage in a complex process that relates data with graphic aspects. As Kaput's model points out (1998), this process involves the construction of a referential relationship between two different entities: the data and the graph that represents them. Thus, we do not believe that tables are transparent cognitive tools that can be used properly by all primary and secondary students during the problem-solving process; instead, an accurate understanding of certain implicit information in tables is required for graphing. Lehrer and Schauble (2007) sought to determine what is transparent and what is obscure in the modeling practices of primary school students. Our results show that what makes graphs obscure and their appropriation difficult and neglected may be the lack of understanding of the double-entry structure of the data that underlie tables and bar graphs. This lack of understanding was apparent at both educational levels.

The Relationship Between Graphing and Tabulating: Cross-Classification and Data Aggregation
To interpret the relationship between the students' difficulties with graphing and their difficulties with tabulating, we discuss the students' difficulties in constructing a canonical table. Based on the analysis of the students' constructions, we identified two cognitive requirements of table construction: (a) comprehension of the cross-classification of variables and (b) comprehension that frequencies are abstract numbers that result from the aggregation of individual cases.
The cross-classification of variables is the essence of tables: any data located in the cells (in our case, frequencies) are related both to one condition of one variable and to another condition of the other variable. From a logical point of view, cross-classification requires the coordination of two dimensions (which occurs, for example, when one must classify objects that are red and square and distinguish them from objects that are red and round, blue and square, or blue and round). Unlike simple or additive classification, multiplicative classification requires the coordination of two (or more) dimensions. According to Piaget, these multiplicative classifications are possible during the concrete operational stage (approximately 7-8 years) (Piaget & Inhelder, 1967). Why, then, did some of the participants in our Study 1 have difficulties in the cross-classification of variables considering that, according their ages, they most likely do not have any difficulty with multiplicative classifications?
We believe that the difficulties in table construction are related to graphic constraints rather than to the students' general competence with logic. First, to construct a table, the students must anticipate and draw the number of rows and columns according to the values of the two dimensions (gender and height). Thus, both variables must be explicitly considered and represented in the table margins, and the students must understand the difference between margins and cells.
Second, the students must also understand that cells in the same row (or column) are related to the value of the variable indicated in the margin. Thus, they must understand that the meaning of the data included in the cells depends on spatial conventions (Brizuela & Lara-Roth, 2002;Marti, 2008;Marti et al., 2011;Novick & Hurley, 2001).
The second cognitive requirement (the computation of frequencies) is related to the specificity of the table that the students must construct (i.e., a table of frequencies). Although logically correct, it is not sufficient for students to place all of the cases that fulfill the conditions indicated in the margins in each cell. To fulfill the goal of the task, the total number of cases that fulfill both conditions must be computed. With respect to the need to provide frequencies, the students appeared very reluctant to remove information from the display, even at the cost of redundancy within their tables or graphs. Schauble (2000, 2007) and Wu and Krajcik (2006) also reported these difficulties in observational studies in which students had to manipulate row data. These studies specifically reported the students' resistance to avoiding literality. Lehrer and Schauble's analysis of the modeling performed by middle school students revealed that the heuristic employed by the students was "more stuff is better" (2000, p. 60). This tendency to include all information in a display reveals a resistance to removing redundant information, which was also a typical characteristic of the tables and graphs constructed by the students in the current study, which controlled for data manipulation.
To explain the students' resistance to removing redundant information, Lehrer and Schauble (2007, p. 158) considered "objectifying data as a mental step away from the cases that the data represent to treat the data as objects in their own right." Rather than taking a case view of the data, objectifying the data entails taking an aggregate view in which the data themselves are the object of manipulation, inspection, and conjecture. For instance, in their study about plant growth, the authors observed how contentious third graders became when graph midpoints did not correspond to any particular case-value. Just as children in Lehrer and Schauble's study (2000) found it counterintuitive to accept that a value deemed to be typical did not appear in the original distribution, they also tended to find the elimination of redundant information counterintuitive. Also from data modeling approach, English (2012) showed how first grade children engage in basic activities that are crucial to graphic representation: structuring of data, detection of redundant information, or awareness of the need to eliminate unnecessary features when they asked modify their initial representations.

Progression Between Primary and Secondary Education
Based on the data related to our second goal, cross-classification and its representation in a graphic display, as well as the aggregation of data into frequencies, appeared to be equally demanding for primary and secondary school students. How can we explain these difficulties and the lack of progression between the primary and secondary school levels? We have observed that logic development related to competence in handling multiplicative structures cannot be an explanation. Some students may have been able to overcome these challenges because of their experience with tables during school instruction (Roth & McGinn, 1997). However, unexpectedly, we found no significant improvements in the table construction performance of secondary students compared with primary students. We argue that the reason for this lack of improvement is that according to our review of the students' textbooks and their teachers' practices (see the questionnaire), table construction is not an explicit object of instruction among these students' school practices. These lead to the importance of explicitly teaching the graphic and cognitive requirements necessary for constructing a table.
These results contrast with those obtained by Äberg-Bengtsson (2006) and Ainley and colleagues (2000) in their studies of first-and second-grade primary school students. These authors used Microsoft Excel to explicitly teach how to create a graph. As they acknowledge, using Microsoft Excel to create a graph involves a linear procedure that requires one to follow a particular sequence of steps. Any explicit teaching of the steps required to construct graphs and tables is beneficial regardless of whether they learn by hand or using a computer. Students should have the option to learn about creating graphs and tables, to eventually be able to exhaustively explain what they have created in both instances, with technology or without it.
According to Ainley and colleagues (2000), the use of computers in education has the potential to revolutionize the ways in which children learn graphing skills. But, as they claim, the ability to produce graphs with a computer does not simply remove the need for traditional penciland-paper skills. The authors suggest that the mastery of graphing requires, among other things, the practical skills required to produce graphs by hand. For this reason, it is important for students to struggle with the decisions involved in the manual construction of a graph using raw data because they appropriate data and develop a better understanding of them. With our results we do not aim to conclude that by-hand graphing tasks should always precede the construction of graphs using computer spreadsheets. However, we think that by-hand activities should not be neglected. We observed how they help novice students explicit all the steps involved in the process, and also, they can help teachers identify their students' main difficulties. The question is whether our students would have created the graphs and tables that illustrate their struggle if they had been asked to represent the data using an Excel spreadsheet. How would they have represented Figure 6 or Figure 7 using a spreadsheet? The first step in graph construction is not only different but also essential in the process. When students are asked to construct a graph on a paper from a set of raw data, they must not only organize the blank sheet of paper to draw the format with the axis but they must also "sketch" something "inside" the frame resulting from setting the axis. As we saw in the Results section (Figure 7), from Pat's graph we can tell that she is able to identify the frequencies for each gender in each interval. She chooses to represent height in the y-axis and frequencies in the x-axis. She draws vertical bars rather than follow the convention of horizontal bars for frequencies being represented in the x-axis. Pat then has to choose a single value within the given interval to mark it in the y-axis. Also along the process, Pat probably realizes that she needs to represent frequencies clearer, and she decides to add them inside the bars. In addition, she has to take the decision of representing gender, and for that, she splits the x-axis frame into two halves, one for each gender value. She does not use the convention of the legend to integrate a second variable (gender) in the graph either. In contrast, when students like Pat are asked to do the same task using a spreadsheet, they choose the axes, label them, set the scaling, and the "sketch" is drawn automatically from the table provided. The steps illustrated in Pat's example are essential decisions that are not made explicit when students use a spreadsheet; they are automatically "drawn." The fact that novice students are free to "sketch" the data within the frame established by the axis opens a window for teachers to observe their mental struggle and, consequently, allows a more precise intervention addressing the specific graphing difficulties.

Differences in Graphing From a Provided Table or From a Self-Made Table
On the other hand, our results also indicated that creating a graph from an existing table appeared to be easier than creating a graph from a self-made table, but only for secondary school students. Thus, contrary to the belief that tables similar to the table presented in our study are straightforward, the students' lack of expertise in understanding of such tables may have implications for their ability to graph the depicted data. In contrast to the expertise of secondary school students, the lower expertise of primary school students is hypothesized to be the cause of their low graphing performance when they were asked to create a graph using a provided table.
Another difficulty that appeared across educational levels for those students who departed from the canonical table involved gender integration. Creating a single graph that integrated heights for both genders (boys and girls) appeared to be a highly demanding task. As previously mentioned, the students tended to overcome this obstacle by constructing two separate graphs. Fewer than 50% of the students constructed graphs that integrated gender. Our claim is that this difficulty may stem from the graphical constraint of condensing all of the information into a two-dimensional structure. That is, having all levels of one of the variables on one axis (i.e., the x-axis) and the frequencies on the other axis (i.e., the y-axis) leaves no dimension for the second variable that must eventually show the values of all subcategories. In contrast, in Study 1, there were no differences in table construction between the primary and secondary school students. The secondary school students outperformed the primary school students in terms of table interpretation but not in terms of the construction of canonical tables because, as mentioned, their school coursework tend to emphasize the use and interpretation of printed tables rather than the manipulation and proper formatting of raw data.
We conclude that students' difficulties in bar graphing can be traced to their tabulation processes. Data organization is an essential bridging tool for understanding the essence of the data and representing them graphically. One-variable bar graphs are introduced very early in the school practices, since they appear easy to start representing data. However the step to a two-variable bar graph is done quite automatically without any explicit consideration to the new processes involved in such step. We show in the present article that data aggregation and cross-classification to integrate the second variable are two of these difficult processes, and show the essential role of tables in such step. Students are often taught, more or less explicitly, traditional representational systems, as isolated topics (English, 2012), or as means to solve other problems. As English put it: "The need for classroom experiences that provide opportunities to structure and display data in ways that students choose and to analyze and revise their creations are important in addressing these early difficulties" (2012, p. 17). We agree with English (2012) that modeling opportunities are to be provided, but we argue that these opportunities must provide specific scaffolds that address the difficult steps involved in graphing.