Principles and Practices for Networked Personal Data Contribution by Pierre Bellanger for the 2014 State Council Report on: Digital Technology and Fundamental Rights and Freedoms Personal data directly or indirectly provides information about an identified individual. This definition establishes a singular personal right of the individual concerned to his data, a right which aims to protect his private life in particular. The current legal and institutional analysis tends to seek to strengthen and confirm the exclusive right of each individual to the data relating to them. Envisaged solutions range from enshrining an autonomous right to determine how such data is collected and used, an independent ability to administer it (whether by copying, modification, transfer or partial deletion) and, finally, property rights over each piece of personal data.
All these approaches have advantages and disadvantages. But do they correspond to the reality of personal data today?
These mechanisms are based on the supposition that personal data is autonomous and only concerns one particular individual; essentially, that personal data is granular, independent and forms an entity in itself, subject to the rights of a single individual.
This concept, which was relevant back in the 20th century when we used files, no longer corresponds to the reality. Today, personal data is no longer isolated; it is networked. It forms a network of data where each piece does indeed remain personal but they are now organised into an inseparable whole.
There are six reasons for this:
- Personal data cannot be isolated in practice: providing access to your list of contacts, photos, diary, emails and position de facto mechanically reveals the personal data of others over which you have no rights; - Personal data provides information about other people: correlation algorithms, the computer programmes through which it is possible to infer information by probability by predictably processing a mass of data without a direct relation to the inferred information, means that all personal data indirectly provides information about someone else. For example: the personal data of bank clients crossed with their missed payments will be used to determine the risk of non-payment by new clients compared to their behaviour. Another example: the data correlated between colon cancer and supermarket consumption of a group of individuals will help predict the cancerous risk of another individual, unrelated to the test group, from his till receipts alone;
- Personal data is an extension of the individual: like blood, it is the self outside the self. Consequently, enabling the transfer or cessation of someone else's personal data which is inseparable or deductible from your own in exchange for “free” access to a service is similar to organ trafficking;
- Individual control by mutual agreement is becoming impossible: the number of collectors of personal data is destined to increase in the form of disseminated captors integrated into most objects wherever we are; meanwhile, each individual is becoming a collector of data on himself and others. Considered individual authorisation for every data capture, already uncertain, is no longer possible in practice. In the future, everything will incorporate IT capture and communication intelligence through which everything can be processed and transmitted in order to transmit a permanent flow of data. Toothbrushes, coffee machines, cars, fridges, watches, glasses, clothes and shoes will capture and be connected. Mute objects will become the exception and the blind environment will disappear;
- The establishment of a monopoly of personal data: the network effect is applied to personal data: the value of a piece of data is proportional to the square of the number of pieces of data to which it is connected. In effect, the value of a piece of data comes from its context provided by additional data.
For example: the purchase of a pushchair provides information about the future consumption of a household. This single piece of data is valuable for any advertisers of products for young children. But a second piece of data could show that the purchase of the pushchair was a present for the neighbours. The data is checked, takes on meaning and therefore becomes knowledge through its smart aggregation to others. Consequently, there is no absolute value in unitary data.
However, the biggest holder of data can bid even higher to acquire new data, in money or free services, as the data has most value for it and, therefore, each new acquisition increases the value of the already collected whole until, logically, it has a monopoly;
- The legal framework for IT modelling of reality: in the end, such global and extensive collecting of data (including personal data) constitutes a quantitative dithering of reality itself. The appropriation by a few companies of a computer-based reconstitution of reality is a source of devastating competition asymmetries and cannot be prevented by a sum of individual rights.
For example: direct or predictive knowledge by a single player of the driving styles of every individual driver gives it a decisive advantage in establishing bespoke driving insurance rates at the best price, choosing its clients and leaving the competition with drivers it has detected would not be profitable. Thus, a sum of personal acceptances without immediate consequences for the individuals concerned could end this insurance sector as we know it, thereby creating a new monopoly which would quickly make insurance more expensive for everyone.
Personal data is therefore no longer granular but reticular, i.e. organised into a network. Personal data is no longer separate but connected. Such intertwining forms a network of personal data which is, in fact, a substitute for the isolated personal data of the past.
What does this network of personal data look like? A hologram is a good analogy: it comes from a lit photo plate which produces a three-dimensional image. Each piece of the plate contains the entire image with lesser definition. The same is true of the network of data: all data reproduces reality and each piece of data provides information about the whole.
Let's take an example: a single grain of sand provides information about the beach because there is a strong resemblance between most grains. However, an item from the basement of a department store (a fancy bracelet for example) provides little information about the decorating or gardening items which are upstairs. The beach is holonomic: we can determine the global information (the beach) from the local information (the grain of sand). The department store is autonomous: each item only determines itself.
In terms of human beings, we share more than 99% of our genome with other members of our species and, according to research carried out for the magazine Science, our behaviour is 93% predictable. Personal data is de facto holonomic.
Of course, such similarities and homogeneous behaviour do not in any way negate the unique character of every human (which is expressed via infinite variations and surprising margins) or their freedom as their free will preserves their improbability at any one moment. Nonetheless, this singularity is expressed in relation to strong conformity to the average, a sort of behavioural barycentre.
Thus, a vision of pieces of data as independent and fundamentally separate from one other is an abstraction which is no longer relevant. Personal data is determined mutually and forms an organic network.
Further, this network is dynamic. The volume of data collected doubles every 18 to 24 months. Data which was previously discrete and therefore isolable is becoming continuous flows of information captured and quantified every moment, connecting data from multiple individual sources in real time. Finally, the logical links connecting pieces of data between themselves is multiplying exponentially. The network of data now forms an animated totality which is permanently growing.
For convenience, we will call the network of personal data PDN.
What is the legal nature of the PDN? It is something over which everybody whose data is integrated therein has rights but which cannot be materially divided between them. It cannot be separated or made individualised by its very nature because each piece of personal data provides information about others. It is therefore a form of joint ownership which involves the entire population. Further, the information stemming from the PDN is of major general interest for the country as a whole, particularly in terms of health, transport, consumption, the environment and economic competitiveness.
Due to its multi-individual origins, the impossibility of splitting it up and its collective usefulness, the PDN is therefore in the public domain (res communis): something which belongs to everyone but not to any one individual. Its status is defined in French law in Article 714 of the Civil Code.
It also means that every individual has specific rights (of withdrawal, opposition or to be forgotten) over his or her own contribution if he or she does not involve the rights of others.
The PDN therefore responds to collective rights and individual rights. Managing and exercising these rights must be the responsibility of a public body, guaranteeing democratic and sovereign control, and alone able to authorise access and use.
Such an organised and referent institution would create the necessary procedures, bodies and dialogue. It must therefore manage both the public domain and the related individual rights. Its ability to take legal action will be essential from this point of view.
A Data Agency could therefore be established. Would the best base for this not be the current National Commission for IT and Freedoms (Commission nationale de l'informatique et des libertés, CNIL)?
Individual Rights
The new technological ability to capture, conserve and process the actions of each individual, and the parallel increase in the share of our lives placed on digital networks and systems, mean that (in this context) any nature and rights must be defined with regard to human beings.
A human being is considered to be a permanent becoming. It is this ability and freedom to become which characterises us and therefore this must be preserved or even increased.
As it has many contradictory and limitless forms and because it only means something in a profound and secret context, the process of becoming is deformed if it is observed by others and therefore judged and normalised. Intimate and solitary alchemy only belongs to one's self. One of the foundations of the human is therefore the right to mystery.
From this personal process stems a character one has chosen; this representation is a social variant of one's self which defines us vis-à-vis others. The integrity of this social person must be preserved. Thus, individual accessible information, our personal story, must in principle, and unless there is a justifiable exception, respond to the individual will of the person concerned. This is the right to choose one's self.
Human beings, for their fulfilment and their freedom of development, must rediscover themselves in an environment which maximises their choices. Any reduction in the field of possibilities, connected to their real or supposed characters, can only be exceptional, known about and justified. With every reduction in possible choices, a common alternative must be proposed. This is the right to the neutrality of the world.
For example: a commercial website adapts its range of products without warning based on its supposition of the purchasing power of its online clients. In so doing, this site limits the freedom of choice of its potential clients to direct their decisions and therefore leads them to make a particular choice which they would not normally have chosen if they had had access to the entire range. Such restricted choice is an attack on individual freedom.
Personal data is an extension of the individual and must therefore be under his or her control. Subject to legal prerogatives, the individual sovereignty of each individual over his or her personal data is guaranteed.
For example: someone has committed a traffic offence in the past. This information is available to future employers and compromises their chances of employment. This person must have the ability to reduce access to this information. The past must not be a prison, excluding temporary and justified exceptions.
Finally, access to data is a formidable means of development of the self and others, the equivalent of access to knowledge. Such access, if it is subject to the rights indicated above, must be free and open to all. This is the right to access data.
For example: someone suffers from a rare illness. In order for them to exercise their judgement and determine their choices, access to the anonymous health data of other people suffering from the same illness could be very useful.
It should be noted that these individual rights are socially useful. What would become of creativity, innovation, enterprise and imagination, and therefore collective progress, without the guarantee for each individual of their IT integrity and protection of their freedom of thought?
Finally, what would become of democracy without these rights which are to the Internet what the polling booth is to the Republic?
Collective Rights
When it is used by appropriate IT programmes, data (including personal data) constitutes the best means of reducing the waste, failures, accidents and losses found in most human systems and structures. Data is at the heart of the resolution of our current problems, the positive progress of our societies, the flourishing of individuals, the reviving of our economy, employment, health and the environment. In this sense, like scientific knowledge it is part of the public domain, not only because of its origins with regard to personal data but because of its destination which makes it a cause of public usefulness.
For example: half of all food is wasted, particularly because of a lack of information which could quickly readjust distribution pathways. A third of the petrol consumed is wasted looking for a parking space and therefore because of the absence of up-to-date information about possible spaces. More seriously, according to IBM use of data would reduce the mortality of hospitalised patients by 20%.
The current rival privatisations of data damage general progress in the sense that they definitively alter the competition: firstly through the network effect (the first player will only be strengthened to the detriment of the others) and secondly because it will force a resource of general interest to become one of private interest alone. This is why supporters of unregulated data “to encourage innovation and competition” are, deliberately or not, in error. Their thinking will lead to the monopoly extinguishing it.
Similarly, the legal individualisation of data means atomising a potential collective right into a sum of more easily solvable private rights: click of acceptance by click of acceptance.
Further, it is not surprising that the companies which devour the most data separately or jointly defend these two theses: they open wide the doors of absolute domination. The first thesis is a crude extension of mercenary reign. The second, more subtle and consistent with our legal tradition, usually gives off the allure of progress.
In reality, competition must be based not on adoption of data but on its use.
It is up to each company to design the best IT programmes (the most efficient algorithms) to get meaning and value from it. That is where real fair and productive competition is to be found. The obligation to mutualise data must therefore be authorised under the aegis and management of the Data Agency in order to provide regulated access which is open to all.
For example: the data collected by smart thermostats at home can help public bodies, the building industry, insulation installers, architects, energy suppliers, individuals and all the IT service providers which will design the operating software for this data for their clients. Leaving this data in the hands of a single player, or a few, devitalises entire sectors.
Recognition of the nature of networked personal data means that an individual no longer has the ability to consent alone to the transfer of or access to his personal data. Any capturing or processing of personal data must involve authorisation from the Data Agency prior to any individual agreement.
For example: someone wants to join a social network and gives it access to their personal data. They can only do this if the social network has already been approved by the Data Agenda which will guarantee both their data and that of those they will clearly reveal.
In comparison, a citizen buys a food product or a toy from a shop for personal use. The fact that these products are on the market means that administrative authorisation has been provided previously.
Further, we have become used to this kind of security for most of our purchases and we naturally extend it to network services which do not benefit from it.
Transferring personal data to an unauthorised service will constitute an offence, including with regard to the people whose personal data might be involved.
For example: someone grants access to their address book to an unauthorised mapping service. In so doing, they provide the coordinates of a third party without authorisation and this person could take legal proceedings.
Therefore, the Agency must approve all data capturing on French soil. Its public body status creates a symmetrical relationship with the large network companies, more balanced than contracts taken out with the click of a mouse by busy individuals.
The Agency will also approve the mechanisms and software enabling individuals to gather other people's personal data. It will also perform practical mediation and arbitration between citizens who are both capturers and captured.
What are the conditions for approval by the Data Agency? - Personal data must captured, conserved, processed and transferred according to the protocols and modalities set out by the Data Agency; - Capturing, conserving, processing and transferring the personal data of a European citizen falls under European jurisdictions alone - which implies EU localisation of IT serversde facto or in law;
- Exporting the personal data of European citizens outside EU territory is limited and taxed;
- Regularisation of the tax situation of the captor with regard to the actual activity generated by use of the personal data captured on French soil; - Acceptance of the mutualisation of data under the supervision of the Data Agency.
What are the practical modalities for processing personal data? Information about an individual is a source of value for the nation as a whole. Information about an identified individual is a risk which could deprive the latter of his freedom. Therefore, the person (his identity) must be separated from the profile (the information collected).
To achieve this, personal data must not be captured, processed or transferred without it being protected. All digital information must be considered public as soon as it is no longer coded. Such de facto alienation is a violation of the aforementioned individual rights.
Such cryptographic encoding must therefore guarantee the individual and collective rights relating to the data without compromising the best use of it. The proposed encoding has three keys. Each key only reveals part of the elements of the piece of data.
So, a piece of data is divided into three parts:
- The identity: what defines the individual uniquely: his name, his face, any biological signature (retina, DNA, voice, digital print, etc.);
- The user profile: all the data relating to a user; the profile is unique to each service or service network;
- The information: information involving at least one person or one profile.
This is presented as follows:
Example of personal data captured and conserved by a museum:
- Level I : XXX-XXX-ACTION: a visit is recorded by the Museum. - Level II : XXX-PROFILE-ACTION: DR589 visited the Museum again. - Level III : IDENTITY-PROFILE-ACTION: Karima Dubois visited the Museum again.
The first level is accessible in public data. Search and processing conditions for unidentified personal data are restricted by granularity and combination thresholds avoiding precision which could reveal an identity. It is a question of guaranteeing uncertainty about the individuals identified by the sizes of the sample, thus maintaining a level of vagueness.
For example: “How many people have a dog in this area?” maintains uncertainty while “How many people have a dog in this building” could divulge identities.
Further, unauthorised reconciling of an action, a profile and an identity will become an offence.
The second level gives access to the history of the profile created by the collector. The conditions for such access are determined by the Data Agency, in such a way as to preserve the secrecy of the identity of the profiles.
For the Museum, its statistics and client relations work is carried out mostly in Levels I and II.
The third level is only accessible upon receipt of a judicial decision giving access to the corresponding cryptographic key. The Ministry of Justice will therefore become Keeper of the Seals and Keys.
For the Museum, the Level III information it generates (such as, for example, data on payment received) is exclusively reserved for the internal ephemeral use approved by the Agency and cannot be processed externally or be involved in any transactions with a third party.
Personal data encapsulated in triple level encoding will be combined with two kinds of additional data or metadata:
- The first is free to access and indicates the conditions of use of the personal data and the specific rights and restrictions which accompany it. Respect for these is compulsory.
- The second constitutes a history of the capsule of data from its origins: i.e. the sequence of all the operations of which it has been the object. This kind of associated memory is the basis of the authenticity of the virtual currency Bitcoin for example, in this instance through the history of transactions attached to each account unit. This part is encrypted and under the control of the judicial key.
In fact, encapsulation and metadata therefore now make the previously inert personal data smart data.
The capsule can itself be a software agent – a small IT programme – which can behave and react autonomously depending on the specific constraints and conventions.
An example is provided by the software system Ethereum: each piece of data provides the conditions for use in a decentralised way. For example: a piece of personal data encapsulated in a software agent containing the traffic position of a vehicle is accessible in Level II for the traffic management services which need it. The software agent recognises the authorised origins of the request, authenticates it, validates it, provides access to the data and then records in metadata consultation of the information.
The software agent could also be partially programmed by the user behind the data to determine a specific relationship of limited access by approved services (subject to the rights of third parties), similar to the Creative Commons licence.
The dazzling progress of processors, storage and bandwidth, and the efficiency of algorithms mean that the excess weight created by protecting data and the associated IT operations will be quickly offset.
Finally, machines should themselves prohibit unauthorised use of capsules of personal data such as, for example, duplications or attempts to gain access, similar to photocopiers or printers which prevent copying as soon as they recognise that the image to be reproduced is a bank note.
The Data Agency will supervise and coordinate global management of data. No data is conserved by the Agency.
The code is not open in order to guarantee the uniqueness and security of the version, avoid malice and abuse, and guarantee immediacy of updates and stability. However, all the Agency's procedures, software and methods shall be independently audited and the subsequent report published.
However, some parts of the source code can be examined upon request and thus opened up for both inspection and improvement by the public, similar to free software. All codes, methods and protocols, excluding limited and justified exceptions, are available to the Justice Department.
Recognition of networked joint ownership of personal data and its status as part of the public domain; recognition of individual and collective rights over this resource; the creation of a Data Agency to manage them; implementation in practice of this legal mechanism through triple encoding of personal data combined with processing metadata; legal supervision and contributory supervision by civil society of the methods and procedures: all of the above provide the required civic guarantees while accelerating progress and innovation through controlled mutualisation of data.
These principles and practices for networked data are applicable firstly in France but the aim is the European and then global dimension.