Big and small data: Watching and discussing television series on streaming

| This paper analyzes the communication processes in online media about four TV series, all of them in Netflix ( La Casa de Papel , Peaky Blinders, Elite, and Sex Education). We are interested in how these streaming series are reconstructed by specific audiences in social networks. We look at the conversations in the digital fans’ communities organized around these series on social networks such as Twitter, Facebook, and YouTube, and other digital contexts (e.g., digital magazines). A total of 408,536 mentions and 189,040 users were generated as the corpus of the data. The analysis process requires a channeling of data flow, defining a coherent system of themes and categories. The paper shows the elements that fans use to reconstruct the stories, for example, mentioning actors or characters. Very similar dimensions appear both in relation to the stories and the way they are constructed from a pandemic situation.

Palavras-chave: Big data; small data; séries de TV; discurso; Netflix; opacidade; comunidade de fãs. martínez, r.; lacasa, p. & del castillo, h. Big and small data: Watching and discussing television series introduction On-demand audiovisual content platforms have revolutionized the TV series market worldwide. Strong competition, along with the need to offer new formats and alternatives, have enhanced the expansion of fictional narratives among different broadcasting platforms. Their audiences also directly impact the construction and development process of these audiovisual productions, through comments and discussion. Followers produce texts for social networks or other digital scenarios, such as commentaries, forums or amateur or professional reviews (Heredia-Ruiz et al., 2021). This paper analyzes this participation through a big data analysis of the fans communities' conversations on the Internet; in addition, there is a qualitative analysis conducted on a selection of them. Thus, it contributes to the coexistence of the researcher with big data opacity. Big data analysis has inevitably led to changes in traditional research methods, on many occasions related to content analysis (Lewis et al., 2013;Adams, 2019), or the focus provided by discourse analysis (Gee, 2014). Our starting point in this study is that both approaches may help eliminate big data opacity.

theoretical frameworK Multiplatform and fictional series
The conceptual limits of television, previously restricted by their technical distribution traits (Jongbloed, 2016) or confined by legal definitions relating thereto (Gauntlett, 2009), have now changed to the development of entertainment contents through the digital platforms of conventional television channels or streaming sites, such as Netflix or HBO (Arrojo & Martin, 2019). This framework has led to new platforms that seek to resolve problems detected in traditional television. New distribution opportunities have arisen through a new digital interconnection system, which coexists with traditional television, generating audience participation (Halpern et al., 2016). We refer to difficulties which would include aspects such as interactivity, immediacy, globalization, the use of multiple platforms, and the search for new targets. For example, at the end of 2018, 160 original series had been produced for viewing on different digital platforms, almost half of which were by Netflix (Corta, 2019). In 2020, Netflix and Amazon Prime competed for audiences, with Amazon ranking first by far (Iqbal, 2020). These changes in the situation contribute to generate a large volume of data in fan communities, which pose challenges both to producers and researchers.
This new creative form and content distribution through the net created an opportunity to offer new formats and to strengthen the expansion of fictional narratives among several different platforms and their audiences. A crucial point here martínez, r.; lacasa, p. & del castillo, h. Big and small data: Watching and discussing television series is the role played by the audience, where the spectator does not wish to be confined to a strict timetable of contents, nor to feel removed from what is happening on the screen. A new culture has arisen of binge-watching series' episodes, with no respect for the concept of series up until now, where cut-off points between each episode and the duration of each one of them were clearly determined (Iqbal, 2020;Martín, 2016). The new audiences, organized in fan communities, are part of a participatory culture (Jenkins, 2015(Jenkins, , 2017, generating comments from their Netflix viewings.

The case of Netflix
In this study, we focus on four Netflix 1 series, an example of a creative audiovisual content and distribution model that breaks away from classical traditional television models and evolves from Internet and streaming possibilities (Heredia-Ruiz et al., 2021;Jenner, 2018).
A major part of Netflix's programming and differentiation strategy involves its own original contents. To develop its extensive catalogue, it optimizes the user's consumption experience thanks to algorithms, big data application, and recommendation systems. Netflix also assures a permanent televised flow, offering a varied catalogue which allows the user to browse through the Internet with total freedom and control. The distributor invests a large budget to create exclusive series, positioning itself in the intersection between the Internet and narrative creativity (Fernández-Manzano et al., 2016;Gomez-Uribe & Hunt, 2016). The launch of these series in a binge-watching context allows fans to enjoy watching their complete favorite series in a single afternoon. The platform has been known to adapt to the needs of a generation which prioritizes immediacy above all, and it is in line with its target audience, which seeks an immediate, innovative, and customized service. Thanks to algorithms, which facilitate browsing through pages of recommended, popular series, or those which correspond with one's personal preference history, Netflix manages to offer this (Gomez-Uribe & Hunt, 2016).
Following this perspective, we focus on the Spanish situation. In this country, the film and series catalogue is smaller (3,522 titles) than in the USA (5,879); however, a study from VPN Sufshark 2 revealed that viewers gave it a score of 7.12 out of 10, thus ranking it among the 10 best countries for its catalogue 3 . In Spain, Netflix has become the quintessential pay-TV.
1. Official site of the platform in Spain. https://www.netflix.com/es 2. Study conducted by the company specialising in VPN Surfshark https://bit.ly/3g1dTwt 3. Spain, among the countries with the fewest products in the Netflix catalogue but best ranking, according to a study published 03/06/2020 in Europapress https://bit.ly/3wFVlI martínez, r.; lacasa, p. & del castillo, h. Big and small data: Watching and discussing television series

Users and algorithms
Netflix was one of the first companies to decide that knowledge regarding its customers' profiles was crucial to have a detailed idea of each of their needs, demands, and tastes. Social networks became a key tool for getting to know audience interests, and mathematical algorithms, along with artificial intelligence, a tool for exploring them. Netflix maximized this information and developed a tool which ignored the direct opinion of the user responding to questions such as "What do you like the most?", concentrating on behavioral data and implicit data to obtain information from behavior. "What do they watch and when?", "On which device?" This is how big data began to work for the entertainment service, through the creation and development of audiovisual contents of interest to its subscribers (Gomez-Uribe & Hunt, 2016).
Subscribers' interests mold the offer of on-demand contents of a country and its users. Although undoubtedly Netflix's major aim is to win the loyalty of its subscribers with big data analysis strategies, the data obtained also serve as quality control for the contents whose rights it has acquired (Govind, 2014), and for the quality and interest of its catalogue for the user (Netflix, 2013). Data analysis that enables Netflix to gain insight into its customers is, therefore, one of the strategic keystones to ensure the service's success.
Against this backdrop, we asked ourselves how we can approach big data from our perspective as social science researchers, and the contributions from the studies on so-called big data of social science studies (Gitelman, 2013). We sought to channel and highlight the great volume of data obtained, exploring information that appears on social media regarding four series broadcasted on Netflix. Our aim was to avoid as much as possible the opacity that characterizes the algorithms hidden behind the software to access the data offered by the Internet. To achieve this goal, we combined quantitative and qualitative data, supported by discourse analysis.

big data and social sciences
People's participation patterns in social media and the advances in the use of computational tools enable to process mass quantities of data (Manovich, 2011;Dourish & Gómez Cruz, 2018;Kitchin, 2014aKitchin, , 2014b, which allow to study communication from new perspectives. The use of big data in social sciences relates to a new definition of knowledge, which changes the objective, and is a radical twist on how to conduct social research, modifying the methods and frameworks related to information (Boyd & Crawford, 2012). Given the rapid development of digital information systems, it has become easier to automatically collect the behavioral data produced through the overwhelming number of routine social activities that now take place online (Zuboff, 2015).
In this framework, to define big data is difficult. This study accepts that there is an enormous volume of data, generated and treated by computational algorithms at great velocity, practically in real time. They may be structured in different ways, or even unstructured. They are also exhaustive -i.e., they represent practically all populations (Kitchin, 2014b). Bearing in mind these characteristics, mass data management needs to be put into a context which will help to interpret that data's impact on society. (Bell et al., 2015;Crawford et al., 2014;Gitelman, 2013). Netflix seeks an exhaustive knowledge of its users through big data to retain them and foster their loyalty.
Many researchers from social sciences and humanities environment criticize the power of decision afforded by this type of analysis (Bollier, 2010;Burrell, 2016;Kitchin, 2014aKitchin, , 2014bMarkham, 2013). Also, the predominance of visualization when presenting the data may obscure other interpretation possibilities. One risk of including big data in social analysis is that the unconditional acceptance of allowing automated processes to make decisions is related to machine-thinking (Harrigan et al., 2016;Kirschenbaum, 2008;Pink et al., 2018).

The Opacity of data
A common theme in studies which explore algorithms in social sciences is that they are profoundly opaque (Pasquale, 2015;Burrell, 2016). Most scholars judge this opacity to be inherently problematic (Eubanks, 2018;Zuboff, 2019).
The algorithms generated by artificial intelligence lead to listening techniques which involve the parameters learned by the machine, the filters introduced by the programmer, and the concepts which the researcher has defined. The results may affect the assessment of expectations generated regarding greater comprehension of social behavior. This will in turn lead to more efficient and effective decisionmaking of the social and natural environment to come into play. This is what some theorists have called opacity (Barocas & Selbst, 2016;Burrell, 2016). These authors propose the existence of three risk and failed transparency levels regarding the so-called big data: • Intentional opacity: the algorithm is not transparent for intellectual property reasons. Privacy is affected and this is related to data protection.
• Illiterate opacity: this is linked to the lack of technical competences of people understanding how algorithms function and the automatic-learning models.
• Intrinsic opacity: in cases of automated learning, an algorithmic system can make such a quantity of calculations that even its creators are unable to explain them. Considering these dimensions, algorithms could be viewed as black boxes (Pasquale, 2015;Burrell, 2016;Morgan, 2018), or devices that can only be understood in terms of their inputs and outputs, with no knowledge of their internal processes (Christin, 2020a). Therefore, since these types of opacity will be present when the researcher tackles big data analysis, the task pending is to progressively clarify it.

Beyond opacity
Several authors speak of transparency as a process to overcome the opacity of algorithms in machine-learning processes, as presented by artificial intelligence. Algorithms also must be progressively transparent because on many occasions they determine life in society and, in this regard, are necessary transparency policies (Diakopoulos, 2015).
However, although researchers agree that the black boxes, the algorithms, must be opened, they do not all agree on the best methods of doing so. The concept of transparency is an ideal, with its origins in philosophy, but which cannot be applied to any context (Ananny & Crawford, 2016). Siles and collaborators (2019) allude to the relationships between users and algorithms when they analyze the Netflix phenomenon. They prefer to use the term domestication of opacity instead of transparency. Transparency is not just a state but a process of observation and knowledge which promises a form of control and, therefore, suggests openness to provide security.
From a cultural viewpoint, Christin (2020bChristin ( , 2020c refers to how ethnographers complement big data, emphasizing data which may come from them, and even probing into the use of the technology used by those who program it. This focus is particularly relevant in this study when we search for the meaning that people attribute to the stories they watch on streaming and reconstruct through social media. From a focus complementary to ethnography , discourse analysis perspectives and their content help exploring this meaning. Regarding the content analysis area, old limits are breaking down, opening-up towards mass quantities of data (Adams, 2019;Karlsson & Strömbäck, 2010;Karpf, 2012). According to Boyd and Crawford (2012), big data analysis may be enriched by traditional content analysis (McMillan, 2000;Riffe et al., 2014).
We are also supported by models based on discourse analysis in specific sociocultural environments closer to qualitative analysis (Bednarek & Caple, 2017;Creeber, 2004;Jenner, 2015;Sjøvaag & Stavelin, 2012). From this point of view, the researcher analyses the conversations, merging this volume of data in hierarchical levels of classification which allow information to continue being received in an organized manner in accordance with the objectives. The meaning is considered, once they have been martínez, r.; lacasa, p. & del castillo, h. Big and small data: Watching and discussing television series contextualized regarding the person expressing them, the moment they do so, and their social and cultural space. Thus, micro and macro-analysis are combined.

methodology and data context Research question and objectives
The general aim of the paper is to provide an understanding of young people's discourse regarding certain TV series, which appear in digital texts, specifically in social media. To do so, we require a progressive clarification of big data opacity. Specific objectives are as follow: 1. Analyze digital texts present on social networks or other media, as expression of the ideas that the followers of certain television series construct.
2. Examine the role of the resources offered by software, to support interpretations which progressively clarify big data opacity.
3. Channel data f low, through clarification processes of raw data contents through software.
4. Implement interpretation processes which enable relationships to be established between the study aim phenomena, in this case, the different television series.
To view and interpret the areas of conversations, understood as mentions linked to a certain product, in this case, series, we used Sentisis Analytics. A twofold process was involved: 1) deductive, supported by theoretical studies which have already been mentioned, both focusing on the TV series presented through platforms (Siles et al., 2019), and the strategies for making visible the meaning of the data (Christin, 2020b(Christin, , 2020c; 2) inductive, since the strategies of clarification and interpretation of the data are generated and modified through their collection process, and in interaction with them (Gitelman, 2013).
This software contains elements for conversation analysis of the series, that appear on the interface to facilitate user interaction (table 1). All of them interact with and complement one another. Access to data includes both numerical and visual representation. Table 1 presents the key elements guiding the analysis. The first part includes elements related to raw data, that provide neutral information: for example, the number of mentions, their authors, or the geographical context. The second part shows those elements controlled by the analyst and that will guide the interpretation, progressively contributing to clarify big data opacity. The three elements are organized according to their degree of generality, that is, the views adopt the most general perspective, the themes are included in the views, and the themes include the categories. martínez, r.; lacasa, p. & del castillo, h. Big and small data: Watching and discussing television series The selected series To select the monitored series, we designed an online questionnaire sent to 170 students aged between 17 and 24, to find out which titles they prefer and why. The convenience sample comprised first-year university students studying degrees in Communication and Education at the Universidad de Alcalá, in Spain.

Conversation content Trends
Independent blocks of information where a type of conversation is monitored. For example, each of the four series is a trend= four trends were analyzed.

Mentions
Includes access to all messages received in each of the sources (Twitter, Facebook, YouTube, other digital texts).

Summary
Contains an amalgamation of data obtained relating to a certain trend or to the total. Includes the evolution over time of all volume of mentions.

Key concepts
These are the outstanding terms which appear within the messages. They are conceptual nuclei which appear inductively, and which are reflected in word clouds.

Outstanding users and viral content
Message authorship is identified. This shows the mentions and users with the most mentions, along with their number of followers.

Location and demographics
Spatially locates the origin of the messages, their geographic location. Includes user profile and the data which objectively identifies them -e.g., age.

Level 1 Views
Through them the information of each trend may be organized, depending on the interest of the researcher or the study aim. In this study three similar views (objective, narrative, and evaluative) were organized for each trend (for the TV series analyzed).

Level 2 Themes
Nuclei of meaning grouped hierarchically and introduced by the analyst. They enable grouping of categories. They help to define coherence of the categories' system.

Level 3 Categories
They merge words together, when having the same meaning, semantically considered. These are the minimum classification units (two types: a) defined by machine through analysis of natural language relative to semantic meanings; b) defined by the analyst and researcher relating to the presence or absence of the mentioned terms). • Sex Education (2019, UK). The story involves an insecure youth who has the answers to any issues regarding sex because his mother is a sexologist. A classmate encourages him to open a sex advisory service. The second season was released on January 17, 2020, and the third has been advertised.

Data
Digital conversations about those series were analyzed between March 13 and April 28, 2020 (46 days, six-and a half weeks). These dates coincided with the toughest period of confinement in Spain, due to COVID19, from March 14 to the beginning of the de-escalation plan, on April 24.
Data was collected using artificial intelligence software and expert systems (Mokhtar & Eltoweissy, 2017; Kitchin, 2014b). The idea was for the machine to automatically learn to recognize complex patterns and construct models, which captured these patterns and optimized outcomes. In the case of this study, the patterns were related to categories defined to channel data flow.
When big data related-analysis techniques are managed, the total participant population is considered, not just a representative sample.

Mentions and users
A total of 408,536 mentions and 189,040 users were generated, although the distribution implies notorious differences between the series. Figure 1 includes the mentions and users analyzed. Data are expressed in thousands. martínez, r.; lacasa, p. & del castillo, h.
Big and small data: Watching and discussing television series La Casa de Papel stands out for mentions representing 70% of the total (285,071 mentions), compared with 19% for Élite, and much fewer in Sex Education (3%) and Peaky Blinders (8%). A relatively similar distribution was observed regarding the percentage of users, 55% mentioned La Casa de Papel, 26% Élite, 6%, Sex Education, and 13%, Peaky Blinders. In general terms, La Casa de Papel and Élite generated a higher participation from the audience, both being Spanish series.
We should also consider the unique type of users who send messages. We can distinguish between a three-fold profile of those appearing in Twitter. Firstly, the channels managed by the producers, for example, STB System Brasileiro de Televisão 4 , with 4.1 million followers, or Netflix España 5 , with 1.1 million; also, the official figures Observation of the sources which generate the data reveal that there are also differences regarding the role of each of the social networks in the four series, as shown in figure 2.
The relevance of Twitter is explained by the software being designed for written texts. The fact that YouTube comments in La Casa de Papel are also relevant relates to the presence of comments on the videoclips, entered by the followers.
Focusing on age, the group between 18 and 23 years old is the one with the highest percentages, mostly present in all the series and distributed within a range from 63% in La casa de Papel to 51% in Élite. These data coincide with those obtained by the Association for Research into Communication Media, although the study was conducted in February 2020, one month prior to the strict lockdown 10 .
To sum up, it should be noted that, although there are important differences in the number of mentions in each of the series, their pattern of variation over time is relatively similar.
6. La Casa de Papel, sitio oficial de Netflix https://bit.ly/3dqawxH 7. Journalist. Specialist in public opinion and political marketing. It is his Twitter account https://bit.ly/3rYFlNZ 8. ABC and Antena 3 Noticias they are two traditional communication channels, well known in Spain. ABC part of the Vocento Group https://www.vocento.com is a newspaper, the second is included in the Antena 3 television network of the Atresmedia Group https://www.atresmedia.com 9. https://www.antena3.com/noticias/ Cadena de TV en España.
10. The study carried out by Statista https://es.statista.com shows that over half of internet users access television contents through streaming, published before the pandemic, 4th February 2020 https://bit.ly/2OvBmup martínez, r.; lacasa, p. & del castillo, h. Big and small data: Watching and discussing television series results We will now show how a clarification process of opaque data was undertaken. This is summarized in diagram 1.
The software provides tools for clarifying the opacity of the raw data, which are presented as a whole in disconnected linguistic terms, and in relation to which frequency of appearance is shown. Second, the interface provides tools to enhance clarification: visualization strategies (views), the channeling of mentions through thematic nuclei (themes) and classifications (categories). Finally, from the analysis of the information obtained, it is possible to interpret the data, establishing comparisons between the study object phenomena -in this case the television series. Clarifying data: the views, themes, and categories The big data obtained from social networks, through the software of Sentisis Analiticis, must be organized into conceptual nuclei, that can be arranged in several dimensions, named as views. In our case, three views were created for each of the series, to facilitate interpretations, supported by comparisons among them. These views were the following: • Nuclei: they are the key elements on which the series is rebuilt, with consideration of the platforms from which the original is viewed and the social networks on which it is reconstructed.
• Process: elements that contribute to the construction of the series, such as the plot or the actions of the characters.
• Evaluation & context: refers to value judgements on the series or elements of the context in which it is viewed and, in turn, includes different themes and categories to progressively clarify data opacity, as discussed below.
These are the cores around which the mentions, messages or posts collected in any of the sources (tweets, retweets, posts...) are organized. They regulate the stream of information and can be modified throughout the data collection process. Categories are defined deductively, considering the theoretical models, and inductively, regarding the flux of the oncoming data (Brown et al., 2017). The categories make sense with one another, forming a system. The themes allow us to organize the conversation considering those categories concerning a particular subject. They are containers to cluster and/or to compare categories relevant to the research objectives, facilitating interpretations.
The structure of the first filtration system, which progressively became more specific during the whole data collection period, is contained in table 2. It includes the main dimensions of the analysis. going in deeP into the data: interPreting the views The results combine interpretations of big and small data, according to the mentioned structure of views, themes, and categories.

The core-view
Looking to clarify the opacity of the data, this view includes the nuclear elements that support the representation of the series that the followers build, as it appears in social networks and other digital texts. Figure 3 shows relationships among the series focusing on the themes selected on this view. The data are expressed in percentages, for easier comparison, but it should again be noted that the overall number of mentions is not homogeneous between the series. Contextsituation Culture, consumption moment.

Coronavirus
References related to the pandemic situation.

Reactions
Question, doubt, surprise, fear/courage, consumption desire, rage. Relevant data from the core-view show that more mentions appear for the foreign series than the Spanish ones, considering the context in which people watch the series, the platforms (Sex Education, 34%, and Peaky Blinders, 32%). Considering the social networks, the same pattern appears in all the series.
Focusing on to the characters mentioned by name, La Casa de Papel is the series with most of them, with 62% of mentions related to a character. These differences are less obvious in the other two series. One possible explanation could be that people prefer to escape from reality and even identify with the main stars of the series, and this distances them from the actor or actress playing the role.
Transcription 2 includes several mentions which may help in the understanding of the relationships between characters and actors and actresses.
The transcription shows how the role of the actress is intertwined with her real self. The comments are about the character's death, which occurs in the fourth season of the series. The discussion refers to the reasons justifying her withdrawal from the series. It looks like an action made by the director rather than the actresses' choice. Nairobi was the character who appealed audiences the most and received the highest number of mentions compared to the others (27%, 16,587 mentions). The transcription shows that the scenography, linked to the figure of the Nairobi's disappearance, is an essential element in the series and the director's choice. Her coffin is released with every honor through the main doors of the Bank of Spain and is a highly emotional scene. The reasons justifying this disappearance are then mixed with mentions of the actress.
The process-view Observing the mentions relating to this view (figure 4), allusions to the plot in La Casa de Papel stand out at 23%. We may recall that among the reasons given by the young people as to why they preferred this series was that it is something you latch onto. The series La Casa de Papel was the one that linked the audiences, surely through processes of deeper immersion. martínez, r.; lacasa, p. & del castillo, h. Big and small data: Watching and discussing television series

Transcript 2. The plot in La Casa de Papel
Source: YouTube link https://bit.ly/3g1ydxI The figure above contains a comment about La Casa de Papel by one of its followers. The person is wondering why the series has become such a popular symbol, which to some extent is connected to a certain plausibility of contents that leads to identification with the characters. The responses given by the author of the analysis to comments such as the one above highlights that he considers this as a window through which to make social criticism.
Plot presence is also relevant in other series. The highest contrasts referring to its importance are observed in differences between Peaky Blinders (38%, 3,741 mentions) and Elite (22%).  The evaluation-view Up until now, it has been shown how the process view is related to dimensions which link fiction to reality. It includes a construction process around elements supporting the series' reconstruction by those who discuss them on the Internet. The process is common to the different series that have been explored. Moving on and focusing on the evaluation -view, we may note that this involves peoples' emotional reactions, present in value judgements, and references to the coronavirus context. Figure 5 synthesizes the themes included in this view.
Globally, we could say that half of the mentions of this view are linked to these emotional reactions. Transcription 3 shows us the role of personal reactions to Sex Education. The previous tweet is markedly emotional, directly related to the sexual content of the series, leading also to audience identification with the characters. Others which are similar may be analyzed. For example, some messages allude to harassment which girls suffer in certain situations, even sexual abuse of minors. It alludes to brutal messages, although at the same time it is acknowledged that they should be obligatory in all secondary schools.

discussion and conclusions
Two themes which are usually presented separately, except for a few exceptions (Siles et al., 2029) have been interlinked in this paper. The first relates to the social and cultural context generated by the presence of television series in streaming, in our case through the Netflix platform (Fernández-Manzano et al., 2016;Jenner, 2018). We approached these series through conversations appearing in digital texts created by their followers through challenge social science investigation. Familiarizing with the interpretation's young make of these series on Internet martínez, r.; lacasa, p. & del castillo, h.
Big and small data: Watching and discussing television series leads to a better understanding of these audiences, stretching beyond their tastes and preferences. Their conversations reveal what the essential elements in their narrative reconstructions consist of, and what their expectations towards the plots or characters involve. The second theme is directly related to methodology. We have been supported by our big data analysis, which in its first stages contains raw, opaque, and complex data to interpret (Kitchin, 2014b;Pasquale, 2015;Barrell, 2016). These analyses were complemented (Christin, 20202b, 2020c) with microanalyses carried out on their contents (Adams, 2019), using the discourse analysis perspective (Gee,204). Our methodological focus in this context helps generating strategies to clarify big data, which the researcher must handle, despite its opacity.

Series reconstruction
First, we considered comments related to the reconstructions of the series, distributed among social networks, grouped together around the productions offered by Netflix (Govind, 2014). Knowledge obtained on the elements present in the reconstruction of the narratives result from big data analysis, and they are clarified through a process of interaction with conversational discourse analyses (Andersen & Linkis, 2019), searching to discover the meaning audiences attribute to series content (Gee, 2014;Christin, 2020a).
Three thematic nuclei were discovered and are present in that reconstruction of the stories viewed and commented upon. Firstly, the raw data, specified through the frequency of appearance of certain themes. At this level, there is no relationships between them. Terms such as season, series, spoiler, or lockdown are markers that suggest to the researcher a need to probe further into new levels of interpretation (Dourish & Gómez Cruz, 2018). Second, it is necessary to define the nuclei which channel the input of these terms, forming a coherent system of relevant elements in the construction of the story (Gauntlett, 2009). The nuclei are as follows: a) allusions to the core of the narration, b) allusions to the construction and creation process, and c) references to context and series evaluation. These data, which already imply an interpretation by the researcher, may be useful for the creators of the Netflix series.
At the first level, what has been called the Core view-stands out. This is the relevance the fictitious characters have, in opposition to the actors playing them, and the importance attached to that reconstruction by the fact that the series are viewed in streaming or to the social networks where they are commented (Dourish & Gómez Cruz, 2018). Regarding elements which impact the reconstruction process of the series, the role the followers attach to the plot and the action of the players is important (Jenner, 2015). Finally, references are made to the evaluation of the series, martínez, r.; lacasa, p. & del castillo, h. Big and small data: Watching and discussing television series where their creativity is highly appreciated, or the role they play entertaining, specifically in a context of a pandemic and lockdown. Comparisons between the series are made possible by these three nuclei (Zuboff, 2019). The most relevant conclusion is that, although there are several specific differences between them, relating to elements which appear in the reconstruction of the series, relatively homogenous patterns exist among them all. For example, in the four analyzed series, the characters that are referred to by their specific fictitious names, are an essential element. There are also similar patterns in the variations, regarding the importance acquired by each element, both in the creation of the series and in its evaluation.

Domestication and clarification of big data
The results obtained concerning how the followers of series reconstruct the stories, presented through Netflix, as summarized above, are an example of how big data may be progressively domesticated (Siles et al., 2019), and how this leads to interpretations that may be relevant for series' creators or producers.
In this regard, this paper shows that the use of big data is conditioned by the elements of the interface required to approach the data (Kitchin, 2014b). Behind this interface is the software permitting the formulation and use of the algorithms. The role of the interface is relevant since data channeling can be viewed through it, generating nuclei that direct the big data flow. This visualization materializes in what has been called views, that present coherent conceptual systems. Each one includes themes which in turn group the system of categories and make them coherent. To sum up, the interface may be an element involved in the domestication of big data.
Another relevant issue is that the software and interface facilitate analysis of conversational content, undertaken through microanalysis inspired by discourse analysis (Lewis et al., 2013). Both help in the selection of categorized mentions, and of others in which certain terms are present. This all leads to further analysis (Christin, 2020a).
We show that big data analyses need to be complemented with other analyses on a micro level, as well as supported by discourse analysis, through deductive processes linked to the researcher's interpretation processes.

Study limitations
One limitation of this study is related to the researchers' knowledge from the ideas that audiences constructed around the television series presented by Netflix. A more in-depth study is required when selecting series, through further specification of criteria and tools. Also, although we selected both Spanish and martínez, r.; lacasa, p. & del castillo, h. Big and small data: Watching and discussing television series foreign series, along with youth-associated or generalist themes, the fact that there are no clear differences in outcomes obtained between the series suggests further study is required.
The existing relationship between the software and the interface when clarifying data opacity is also an unresolved matter. An interdisciplinary study could compare several interfaces to see to what extent they orientate researcher interpretation and in what directions. Also, when focusing more specifically on software and algorithms that generate big data, distinction should be made between categories generated by machine-supported by artificial intelligence-learning and categories which the researchers generate through the grouping of certain terms. Finally, transcending the descriptive level in search for relations between categories is also needed, from an interdisciplinary approach. references