Privacy tips and tools

EFF’s Street-Level Surveillance project

1/30/2024

https://sls.eff.org/

EFF’s Street-Level Surveillance project shines a light on the surveillance technologies that law enforcement agencies routinely deploy in our communities. These resources are designed for advocacy organizations, journalists, defense attorneys, policymakers, and members of the public who often are not getting the straight story from police representatives or the vendors marketing this equipment.
Whether it’s phone-based location tracking, ubiquitous video recording, biometric data collection, or police access to people’s smart devices, law enforcement agencies follow closely behind their counterparts in the military and intelligence services in acquiring privacy-invasive technologies and getting access to consumer data. Just as analog surveillance historically has been used as a tool for oppression, we must understand the threat posed by emerging technologies to successfully defend civil liberties and civil rights in the digital age.
The threats to privacy of these surveillance technologies are enormous, as law enforcement agencies at all levels of government use surveillance technologies to compile vast databases filled with our personal information or gain access to devices that can lay bare the intricacies of our daily lives. Use of these surveillance technologies can infringe on our constitutional rights, including to speak and associate freely under the First Amendment or be free from unlawful search and seizure under the Fourth Amendment. Law enforcement also tends to deploy surveillance technologies disproportionately against marginalized communities. These technologies are prone to abuse by rogue officers, and can be subject to error or vulnerability, causing damaging repercussions for those who interact with the criminal justice system.

0 Comments

Check the privacy of your browser

11/7/2023

0 Comments

The EFF have a new tool to test your browser's potential for tracking. You can test your browser to see how well you are protected from tracking and fingerprinting (You want results similar to ours in the illustration above.)

TEST YOUR BROWSER

How does tracking technology follow your trail around the web, even if you’ve taken protective measures?
Cover Your Tracks shows you how trackers see your browser. It provides you with an overview of your browser’s most unique and identifying characteristics.
Only anonymous data will be collected through this site.

Want to learn more about tracking?
When you visit a website, your browser makes a "request" for that site. In the background, advertising code and invisible trackers on that site might also cause your browser to make dozens or even hundreds of requests to other hidden third parties. Each request contains several pieces of information about your browser and about you, from your time zone to your browser settings to what versions of software you have installed.
Some of this information is passed along by default simply to help you view the page. For example, HTTP headers are essential to most web functionality, and broadcast your device and browser version. But a lot of the information in your browser’s requests is also extracted by third-party ad networks, which have sneaky tracking mechanisms embedded across the Internet to gather your information.
At first glance, the data points that third-party trackers collect may seem relatively mundane and disparate. But when compiled together, they can reveal a detailed behavioral profile of your online activity, from political affiliation to education level to income bracket. As long as this trove of data about you is linked back to you, your online activity can be logged. Ad networks primarily rely on two methods to maintain this link: cookie tracking, and browser fingerprinting.
What are cookies?Cookies are small chunks of information that websites store in your browser. Their main use is to remember helpful things like your account login info, or what items were in your online shopping cart—in other words, they save your place. But they can also be misused to link all your visits, searches, and other activities on a site together. This use of cookies is a privacy violation, and browsers generally allow you to block, limit, or delete cookies.
What is a digital fingerprint?A digital fingerprint is essentially a list of characteristics that are unique to a single user, their browser, and their particular hardware setup. This includes information the browser needs to send to access websites, like the location of the website the user is requesting. But it also includes a host of seemingly insignificant data (like screen resolution and installed fonts) gathered by tracking scripts. Tracking sites can stitch all the small pieces together to form a unique picture, or "fingerprint," of your device.
What is the difference?Think of the small tracking devices scientists use to follow animal migration patterns, or a GPS transmitter attached to a car. As long as they’re attached to the target animal or vehicle, they are accurate and effective—but they lose all value if they’re knocked off or discarded. This is roughly how cookies behave: they track users up until the point a user deletes them.
Fingerprinting uses more permanent identifiers such as hardware specifications and browser settings. This is equivalent to tracking a bird by its song or feather markings, or a car by its license plate, make, model, and color. In other words, metrics that are harder to change and impossible to delete.
Can I do anything about this?!Completely blocking trackers is difficult, even with a fully-featured tracker blocker. Even so, we recommend using the tracking protections above. Privacy protection does not have to be perfect to make a big difference!
There are two main dynamics that make trackers hard to entirely avoid online:

Impact on Usability: It’s unfortunate that enhanced privacy often comes at the expense of functionality. For instance, you may want to disable JavaScript to stop tracking scripts from running. But this will likely make it hard to shop, fill out forms, watch videos, or see interactive web elements. Many pages require disabling your ad blocker to see content, or refuse to load anything unless you use the “official” app.
Identifiable Protections: Paradoxically, sometimes your protections themselves can become part of your fingerprint. An add-on intended to protect you can even lead to your full identification. Changing your settings and installing protections can lead trackers to be identified. In this case, you become a “mystery user with a very specific combination of privacy protections installed.”

In practice, the most realistic protection currently available is the Tor Browser, which has put a lot of effort into reducing browser fingerprintability. For day-to-day use, the best options are to run tools like Privacy Badger or Disconnect that will block some (but unfortunately not all) of the domains that try to perform fingerprinting, and/or to use a tool like NoScript( for Firefox), which greatly reduces the amount of data available to fingerprinters.
Cover Your Tracks’ primary goal is to help you determine your own balance between privacy and convenience. By giving you a summary of your overall protection and a list of characteristics that make up your digital fingerprint, you can see exactly how your browser appears to trackers, and how implementing different protection methods changes this visibility. The following suggestions are simple, straightforward protection methods, and are an excellent starting point.
Simple suggestionsUsing a Tracker BlockerInstall a tracker blocker and watch your browsing experience get a lot more pleasant
Most tracker blockers cross-reference massive lists of tracking scripts. They then block any attempts to load an ad or other item that matches.
When you block trackers, you prevent tracking companies from reading your browser fingerprint. However, more advanced tracking techniques may still be able to gather information about you.
Disabling JavascriptMost trackers run on JavaScript, and they can’t gather much of the information used to determine your browser fingerprint without it. Thus, your browser looks a lot less distinct, and is more protected.
But there is a trade off. Disabling JavaScript breaks a staggering amount of websites, and limits the functionality of many more.
Changing browser settings from defaultsTracking is so pervasive that all of the major browsers (Chrome, Firefox, and Safari) come with settings that disable certain types of tracking. Turning them on or off is as simple as going into the settings menu and clicking a button.
Disabling tracking scripts in your browser settings is reliably effective, though not as robust as a designated tracker-blocker.
For more info about what settings and protections your browser offers compared to others, check out this article from Blacklight.
Using a fingerprint resistant browserSome newer browsers were built to thwart fingerprinting, such as Tor Browser and Brave. How they do this varies from browser to browser, but they generally work by making your fingerprint less unique and/or less consistent. This means trackers have a harder time following your usage of the web.
Can my attempts to protect myself backfire? How can attempting to make myself more anonymous actually make me more identifiable?Each browser metric is highly connected to other metrics in complex ways. This is why we don’t recommend trying to change a single element of your fingerprint. Striving to get the most common result for any individual metric may seem like a good idea, but it can actually make your browser more identifiable.
Let’s look at an example of how these metrics are interconnected:
No matter what browser you’re using, they all send information about themselves to servers so that web content loads correctly. This information includes the browser name and version. If you swap out the identifier of the browser you're actually using with one from a more common browser, you may make yourself completely identifiable. How is this possible? If Chrome is a more common browser, how can identifying your browser as Chrome make you more unique?
Because trackers aren’t only looking at what browser version you have. In combination with other metrics, your fake Chrome browser may stand out. This is because if you are actually using, say, Safari browser all the other metrics will point to this fact. You will have the only browser out there identifying itself as Chrome but looking like Safari.
Incognito modeHistorically, Private Browsing and Incognito Mode had a single purpose. These modes were intended to prevent traces of sites you visited from being stored on your machine. It was not meant to prevent remote sites or trackers from identifying and storing when you visit a site on their servers.
If you are using Firefox, using Private Browsing will provide some protections against trackers. Any trackers that are included in the Disconnect tracking protection list will be blocked. This keeps you safe from known trackers. Known fingerprinters and cryptominers which use your browser against you are also blocked. However, this will not prevent a new fingerprinter or tracker from identifying your browser and keeping tabs on it. In order to get this extra level of protection, your browser needs to have a fingerprint which is either:

so common that a tracker can't tell you apart from the crowd (as in Tor Browser), or
randomized so that a tracker can't tell it's you from one moment to the next (as in Brave browser).

Google's Chrome browser does not provide protection against trackers or fingerprinters in Incognito Mode.

0 Comments

DATA BREACHES CAUSED BY THIRD-PARTIES

8/25/2023

0 Comments

This is an interesting one. Here are some quick facts to set the landscape:

In financial services, they reported the second most third party breaches despite their third party spending the most time on assessments, over 17,000 hours per year.
In health and pharma, they're less likely to have a third-party breach and most likely to use a combination of tools to assess third parties.
In the public sector, they use a combination of tools to assess third parties and tend to believe the results are valuable.
In retail, they reported the most third-party data breaches despite their third-party spending over 16,578 hours on assessments.
Then in tech and software, they are the most likely to have multiple third-party data breaches, and over 41 percent still use manual procedures to assess third parties.
In the same study, they determined that third party breaches remain an expensive problem.

OK, so what do we think? There is no easy way to quantify 3rd party risk (and that is coming from a professional seasoned in that area) .

But here is a website that will at least let you keep tabs on what's happening in this space. https://blackkite.com/data-breaches-caused-by-third-parties/

0 Comments

Octopii: an open-source AI-powered Personal Identifiable Information (PII) scanner that can look for image assets such as Government IDs, passports, photos and signatures in a directory.

11/29/2022

0 Comments

We like this for PII discovery.
https://github.com/redhuntlabs/Octopii
WorkingOctopii uses Tesseract's Optical Character Recognition (OCR) and Keras' Convolutional Neural Networks (CNN) models to detect various forms of personal identifiable information that may be leaked on a publicly facing location. This is done in the following steps:

Importing and cleaning image(s)

The image is imported via OpenCV and is cleaned, deskewed and rotated for scanning.

Performing image classification

The image is scanned for features such as an ISO/IEC 7810 card specification, colors, location of text, photos, holograms etc. This is done by passing it anf comparing it against a trained model.

Optical Character Recognition (OCR)

As a final verification method, images are scanned for certain strings to verify the accuracy of the model.
The accuracy of the scan can determined via the confidence scores in output. If all the mentioned conditions are met, a score of 100.0 is returned.
To train the model, data can also be fed into the model_generator.py script, and the newly improved h5 file can be used.

Usage

Install all dependencies via pip install -r requirements.txt.
Install the Tesseract helper locally via sudo apt install tesseract-ocr -y (for Ubuntu/Debian).
To run Octopii, type python3 octopii.py <location name>, for example python3 octopii.py pii_list/

python3 octopii.py <location to scan> <additional flags> Octopii currently supports local scanning and scanning S3 directories and open directory listings via their URLs.

Exampleowais@artemis ~ $ python3 octopii.py pii_list

Not a valid image format: pii_list/aadhaar/aadhaar-8.gif

[
{
"asset_type": "Bank",
"confidence": 100.0,
"file_name": "passbook",
"extension": "jpeg",
"path": "pii_list/bank/passbook.jpeg"
},
{
"asset_type": "Photo",
"confidence": 99.98,
"file_name": "IMG-20200331-WA0037",
"extension": "jpg",
"path": "pii_list/photos/IMG-20200331-WA0037.jpg"
},
{
"asset_type": "PAN",
"confidence": 100.0,
"file_name": "pan-7",
"extension": "jpg",
"path": "pii_list/pan/pan-7.jpg"
},
{
"asset_type": "Aadhaar",
"confidence": 97.31,
"file_name": "aadhaar-14",
"extension": "jpg",
"path": "pii_list/aadhaar/aadhaar-14.jpg"
}
]

0 Comments

Blacklight

11/22/2022

0 Comments

We love this one.  https://themarkup.org/blacklight

Blacklight.   A Real-Time Website Privacy InspectorBy Surya MattuWho is peeking over your shoulder while you work, watch videos, learn, explore, and shop on the internet? Enter the address of any website, and Blacklight will scan it and reveal the specific user-tracking technologies on the site—and who’s getting your data. You may be surprised at what you learn.

Read about what they learned running this tool against different websites:  https://themarkup.org/series/blacklight

and how they built it.....

Blacklight is a real-time website privacy inspector.
The tool emulates how a user might be surveilled while browsing the web. Users type a URL into Blacklight, and it visits the requested website, scans for known types of privacy violations, and returns an instant privacy analysis of the inspected site.
Blacklight works by visiting each website with a headless browser, running custom software built by The Markup. This software monitors which scripts on that website are potentially surveilling the user by performing seven different tests, each investigating a specific, known method of surveillance.
The types of surveillance that Blacklight seeks to identify are:

Third-party cookies
Ad trackers
Key logging
Session recording
Canvas fingerprinting
Facebook tracking
Google Analytics “Remarketing Audiences”

These are defined later in this document, as are their limitations.
Blacklight was built using the NodeJS Javascript environment, the Puppeteer Node library, which provides high-level control over a Chromium (open-source Chrome) browser. When a user enters a URL into Blacklight, the tool opens a headless web browser with a fresh profile and visits its homepage as well as an additional randomly selected page deeper inside the same website.

Who’s peeking over your shoulder as you work, learn, or explore the internet?Try out Blacklight here. Enter a website, and Blacklight will scan it for user-tracking technologies — and who’s getting your data.
Enter a URL for Blacklight to scanWhile the browser is visiting the website, it runs custom software in the background that monitors scripts and network requests to observe when and how user data is being collected. To monitor scripts, Blacklight modifies various fingerprintable properties of the browser’s Window API. This allows Blacklight to log which script made a particular function call, using the Stacktrace-js package. The network requests are collected using a monitoring tool included in Puppeteer’s API.
Blacklight uses the script data and network requests to run the seven tests described above. Afterward, it closes the browser and generates an instant report for the user.
It records a list of all the URLs that the inspected website requests. In addition, it makes a list of all domains and subdomains that were requested. The tool we provide to the public will not save those lists unless the user chooses to share results with us through an option in the tool.
We define domain names using the Public Suffix + 1 method. We define first-party domain as any domain that matches the website visited, including subdomains. We define third-party as any domain that does not match the website visited. The tool compares the list of third-party domains from the website requests with DuckDuckGo’s Tracker Radar dataset.
This data merge allows Blacklight to add the following information about the third-party domains found on the inspected site:

Name of the domain’s owner.
Categories assigned by DuckDuckGo to each domain that attempt to describe its observed purpose or intent.

This additional information about third parties is provided to users as context for Blacklight’s instant test results. Among other things, this information is used to count the number of advertising-related trackers present on a given website.
Blacklight runs tests based on the root URL of the page entered by a user into the tool. For example, if a user types in https://example.com/sports, Blacklight starts its inspection at https://example.com and disregards the /sports path. If a user types in https://sports.example.com, Blacklight starts its inspection at https://sports.example.com.
Report Deeply and Fix Things Because it turns out moving fast and breaking things broke some super important things.

Blacklight results for each requested domain are cached for 24 hours, and these cached reports are delivered in response to subsequent user requests for the same website during those 24 hours. This is designed to prevent the tool from being used maliciously to overwhelm a website with thousands of automated visits.
Blacklight will also tell users whether their results are high, low, or about average compared with what the tool found on the 100,000 most popular websites as ranked by the Tranco List. This is described in more detail below.
The Blacklight code base is open source and available on Github; it can also be downloaded as an NPM module.
There are limitations to our analysis. Blacklight emulates a user visiting a website, but its automated behavior is different from human behavior, and that behavior may trigger different types of surveillance. For instance, an automated request might trigger more fraud detection but fewer ads.
Given the dynamic nature of web-based technology, it is also possible that some of these tests will become out-of-date over time. And new legitimate-use cases for the techniques Blacklight flags could emerge that would not be listed in the tool’s caveats.
For this reason, Blacklight results should not be taken as the final word on potential privacy violations by a given website. Rather, they should be treated as an initial automated inspection that requires further investigation before a definitive claim can be made.

Previous WorkBlacklight is built on the foundation of various privacy census tools built over the past decade.
It runs Javascript instrumentation, which enables it to monitor calls to the browsers’ Javascript API. This is based on OpenWPM, an open-source tool for web privacy measurement built by Steven Englehardt, Gunes Acar, Dillon Reisman, and Arvind Narayanan at Princeton University. It is now maintained by Mozilla.
OpenWPM was used to power Princeton’s Web Transparency and Accountability Project, which monitored websites and services to discover companies’ data collection, data use, and deceptive practices.
Through numerous studies conducted between 2015 and 2019, Princeton researchers uncovered the presence of many privacy-infringing technologies. These included browser fingerprinting and cookie syncing as well as how session replay scripts collect passwords and sensitive user data. One notable example is the exfiltration of prescription data and health-conditions data from walgreens.com.
Five of the seven tests Blacklight runs are based on the techniques described in the Princeton research mentioned above. These tests are canvas fingerprinting, key logging, session recording, and third-party cookies.
OpenWPM incorporates code and techniques from other privacy inspection tools, including FourthParty, Privacy Badger, and FP Detective:

FourthParty was an open-source platform for measuring dynamic web content that was released in August 2011 and maintained until 2014. It has been used in various studies, including one that describes how websites like Home Depot were leaking their customers' usernames to third parties. Blacklight uses FourthParty’s method to monitor what user information is being sent over the network to third parties.
Privacy Badger is a browser add-on made by the Electronic Frontier Foundation and released in May 2014. It prevents advertisers and third-party trackers from following people on the internet.
FP Detective was the first comprehensive study to measure the prevalence of device fingerprinting on the internet. The tool was released in 2013 and was used to conduct large-scale web-privacy studies.

Blacklight’s data analysis was inspired in part by the Website Evidence Collector developed by the European Union’s Electronic Data Protection Supervisor (EDPS). The Website Evidence Collector is a NodeJS package that uses the Puppeteer library to discover how a website collects a user’s personal data. Some of the categories of collected data were chosen by the EDPS.
Other projects that have influenced Blacklight’s development include the Web Privacy Census, conducted at UC Berkeley in 2012, and the Wall Street Journal’s “What They Know” series.

How We Analyzed Each Type of TrackingThird-Party CookiesThird-party cookies are a small piece of data that tracking companies store in your web browser when you visit a website. This bit of text—usually a unique number or string of characters—identifies you when you visit other websites that contain tracking code from the same company. Third-party cookies are used by hundreds of companies to build dossiers about users and deliver customized ads based on their behavior.
Popular web browsers Edge, Brave, Firefox, and Safari all block third-party tracking cookies by default, and Chrome has announced that it will phase them out.
What Blacklight Tests
Blacklight monitors network requests for the “Set-Cookie” header and observes all domains that set cookies using the document.cookie javascript property. Blacklight identifies third-party cookies as those whose domains do not match the domain of the website being visited. We look up these third-party domains in DuckDuckGo’s Tracker Radar data to find out who owns them, how prevalent they are, and what kinds of services they provide.

Key LoggingKey logging is when a first or third party monitors the text that you type into a webpage before you hit the submit button. This technique has been used for a variety of purposes, including identifying anonymous web users by matching them to postal addresses and real names.
There are other reasons for key logging, such as providing autocomplete functionality. Blacklight cannot determine the intent behind the inspected website’s use of this technique.
What Blacklight Tests
In order to test whether this is happening on a given website, Blacklight types predetermined text (see Appendix) in all input fields but never clicks on a submit button. It monitors network requests to see if the data that was entered was sent to any servers.

Session RecordingSession recording is technology that allows a third party to monitor and record all of a user’s behavior on a webpage—including mouse movements, clicks, scrolling down the page, and anything you type into a form even if you don’t click submit.
In a 2017 study, researchers at Princeton University found that session recorders were collecting sensitive information such as passwords and credit card numbers. When the researchers contacted the companies in question, most responded quickly and fixed the underlying cause of the data leak. However, the research highlights that these aren’t simply bugs but rather insecure practices that the researchers say should be stopped entirely. Most companies that offer session recording say they use the data to provide their customers—the websites installing the technology—meaningful insights on how to improve a user’s experience on the website. One company, Inspectlet, describes its service as watching “individual visitors use your site as if you’re looking over their shoulders.” (Inspectlet did not respond to an email seeking comment.)
Credit:Inspectlet

Caption: Screenshot from Inspectlet, a known session recording provider.What Blacklight Tests
We define session recording as the loading of a specific type of script by a company that we know to be providing session recording services.
Blacklight monitors the network requests for specific URL substrings that appear only when session recording is taking place, according to a list created by researchers at Princeton University in 2017.
Report Deeply and Fix ThingsBecause it turns out moving fast and breaking things broke some super important things.

Sometimes key logging is used as part of session recording. In those cases, Blacklight would correctly report the session recorder as both key logging and session recording because we observed both, even though both tests are identifying the same script.
Blacklight accurately detects when a website loads these scripts—but companies typically record only a sample of website visits, so not every user is being recorded on every visit.

Canvas FingerprintingFingerprinting describes a group of techniques that try to identify your browser without setting a cookie. They can identify you even if you block all cookies.
Canvas fingerprinting is a type of fingerprinting that identifies users by drawing shapes and text on a user’s webpage and noting the minor differences in the way they are rendered.

Caption: Four examples of canvas fingerprinting found with Blacklight.These differences in font rendering, smoothing, and anti-aliasing and other features are used by marketers and others to identify individual devices. All of the major internet browsers, except Chrome, try to counter canvas fingerprinting—either by not fulfilling data requests for scripts known to have engaged in the practice or by trying to standardize users’ fingerprints.
The image below is an example of the type of canvas images used by fingerprinting scripts. These canvases are usually invisible to the user.
What Blacklight Tests
We follow the methodology described in this paper by researchers at Princeton University to identify when the HTML canvas element is used for tracking purposes. The parameters we use to identify canvases that are being drawn for fingerprinting purposes are:

The canvas element's height and width properties must not be set below 16px.
Text must be written to the canvas within at least 10 distinct characters.
The script should not call the save, restore, or addEventListener methods of the rendering context.
The script extracts and image with a toDataURL or with a single call to getImageData that specifies an area with a minimum size of 16px × 16px.

We have not seen this in practice, but it is possible that Blacklight could falsely label a legitimate use of the canvas that matches these heuristics. In order to account for this, the tool captures the image drawn by the script and renders this in the tool. Users should be able to determine the use of canvas simply by viewing this image. A typical fingerprinting script is shown above.

Ad TrackersAd trackers are technologies that identify and collect information about users. These technologies usually (but not always) appear with some level of consent from the website owners. They are used to collect website user analytics, for ad-targeting, and by data brokers and other information collectors to build user profiles. They usually take the form of Javascript scripts or web beacons.
Web beacons are small 1px by 1px images that are placed on a website for tracking purposes by third parties. Using this technique, a third party can determine behaviors including when a particular user went to a site, the kind of browser, and what IP address it used.
What Blacklight Tests
Blacklight checks all network requests against the EasyPrivacy list, which contains URLs and URL substrings that are known to be used for tracking. Blacklight monitors the network activity for requests being made to these URLs and substrings.
Blacklight only records requests being made to third-party domains. It ignores any URL patterns in the EasyPrivacy list that match a first-party domain. For example, the EFF hosts its own analytics, and that results in requests to “https://anon-stats.eff.org,” their analytics subdomain. If a user types in https://eff.org, Blacklight does not consider calls to https://anon-stats.eff.org to be a third-party request.
We look up these third-party domains in DuckDuckGo’s Tracker Radar data set to find out who owns them, how prevalent they are, and what kinds of services they provide. We only include third-party domains that belong to the “Ad Motivated Tracking” categories defined in the Tracker Radar data set.

Facebook PixelThe Facebook pixel is a piece of code Facebook created that allows other websites to target their visitors later with ads on Facebook. Common actions that can be tracked by pixel include viewing a page or specific content, adding payment information, or making a purchase.
What Blacklight Tests
Blacklight looks for network requests from the site going to Facebook and looks in the URL query parameters for data that matches the schema of what is described in the documentation for Facebook’s pixel. We look for three different types of data: “standard events,” “custom events” and “advanced matching.”

Google Analytics’ “Remarketing Audiences”Google Analytics is the most popular website analytics platform in use today. According to whotracks.me 41.7 percent of web traffic is analyzed by Google Analytics. While most of the functionality of this service is to provide developers and website owners with information on how their audience is engaging with their website, the tool also allows the website to make custom audience lists based on user behavior and then target ads to those visitors across the internet using Google Ads and Display & Video 360. Blacklight examines inspected sites for the presence of the tool, not how it is used.
What Blacklight Tests
Blacklight looks for network requests from the inspected site going to a URL beginning with “stats.g.doubleclick” that also contains the “UA-” Google account identifier prefix. This is described in more detail in Google Analytics developer documentation.

SurveyTo determine the prevalence of tracking technologies on the internet both for context in Blacklight and for accompanying news stories, we ran the 100,000 most popular websites as defined by the Tranco List through Blacklight. The data and analysis code can be found on Github . Blacklight successfully captured data for 81,617 of those URLs. The rest either failed to resolve, timed out on multiple attempts, or didn’t load a webpage. The percentages listed below are for the 81,617 successful captures.
Some of the analysis goes beyond what appears on the tool. The key findings from our survey are as follows:

6 percent of websites used canvas fingerprinting.
15 percent of websites loaded scripts from known session recorders.
4 percent of websites logged keystrokes.
13 percent of sites did not load any third-party cookies or tracking network requests.
The median number of third-party cookie loads was three.
The median number of ad trackers loaded was seven.
74 percent of sites loaded Google tracking technology.
33 percent of websites loaded Facebook tracking technology.
50 percent of sites used Google Analytics’ remarketing feature.
30 percent of sites used the Facebook pixel.

We classified as Google tracking technology any network requests being made to any of the following domains:

google-analytics.com
Doubleclick.net
Googletagmanager.com
Googletagservices
Googlesyndication.com
Googleadservices
2mdn.net

We classified as Facebook tracking technology any network requests being made to any of the following Facebook domains:

facebook.com
Facebook.net
atdmt.com

LimitationsBlacklight’s analysis is limited by four main factors:

It is a simulation of a user behavior, not actual user behavior, and could thus trigger different surveillance responses.
The inspected website could be surveilling user activities for benign purposes.
False positives (possible with canvas fingerprinting): Very occasionally, legitimate uses of the HTML canvas match the heuristics Blacklight uses to identify canvas fingerprinting.
False negatives: The stack tracing technique used by Blacklight’s Javascript instrumentation might incorrectly attribute a call to a window API method we are monitoring to a library included by a script. For example, if a fingerprinting script uses jQuery to do some calls, jQuery might end up on the top of the stack and Blacklight will attribute the call to that instead of the script that's actually responsible. This possibility was brought to our attention by researchers who reviewed our methodology; we have not seen it occur in our tests or our survey of the 100,000 most popular sites.

Report Deeply and Fix ThingsBecause it turns out moving fast and breaking things broke some super important things.

Regarding false positives, when Blacklight visits a site, that site can see the request is coming from computers hosted by Amazon’s AWS cloud infrastructure. Because botnets are often run on cloud infrastructure, our tool could trigger bot-detection software on the website, including canvas fingerprinting. This could result in false positives for the canvas fingerprinting test where the purpose of the test is not to track users but rather to detect botnets.
In order to test this, we took a random sample of 1,000 sites from the top websites from the Tranco List that we had already run through Blacklight on AWS. We ran this sample through Blacklight software on our computer locally at a residential IP address in New York City. We concluded that the results of a Blacklight inspection locally are very similar, but not exactly the same, as running it on cloud infrastructure.
↩︎ linkResults for Sample: Local Computer and AWSLocal AWS
Canvas fingerprinting8%10%
Session recording18%19%
Key logging4%6%
Median number of third-party cookies45
Median number of third-party trackers recorded78Not all surveillance activity that is imperceptible to the user is necessarily malicious. For instance, canvas fingerprinting is used for fraud prevention because it can identify a device. And key logging can be used to provide autocomplete functionality.
Blacklight does not attempt to identify the intent of any particular tracking technology it finds.
Nor can Blacklight determine exactly how a website uses the data it collects on a user when loading session recording scripts and monitoring user behavior, such as mouse movements and keystrokes.
Blacklight does not check the terms of use or privacy policies of the websites it visits to see whether they disclose their surveillance activities.

↩︎ linkAppendixInput field values
The table below lists the values we programmed Blacklight to type into input fields on websites. We used the Mozilla autocomplete attribute write-up as our reference. Blacklight also checks for the base64, md5, sha256 and sha512 versions of these values.
Autocomplete Attribute Blacklight Value
Date01/01/2026
[email protected]
PasswordSUPERS3CR3T_PASSWORD
SearchTheMarkup
TextIdaaaaTarbell
URLhttps://themarkup.org
OrganizationThe Markup
Organization TitleNon-profit newsroom
Current PasswordS3CR3T_CURRENT_PASSWORD
New PasswordS3CR3T_NEW_PASSWORD
Usernameidaaaa_tarbell
Family NameTarbell
Given NameIdaaaa
NameIdaaaaTarbell
Street AddressPO Box #1103
Address Line 1PO Box #1103
Postal Code10159
CC-NameIDAAAATARBELL
CC-Given-NameIDAAAA
CC-Family-NameTARBELL
CC-Number4479846060020724
CC-Exp01/2026
CC-TypeVisa
Transaction Amount13371337
↩︎ linkAcknowledgementsWe thank Gunes Acar (KU Leven), Steven Englehardt (Mozilla), and Arvind Narayanan and Jonathan Mayer (Princeton, CITP) for comments and suggestions on an earlier draft.

0 Comments

Noteworthy Privacy tips and tools

EFF’s Street-Level Surveillance project

Check the privacy of your browser

DATA BREACHES CAUSED BY THIRD-PARTIES

Octopii: an open-source AI-powered Personal Identifiable Information (PII) scanner that can look for image assets such as Government IDs, passports, photos and signatures in a directory.

Blacklight

Author

Archives

Categories