https://sls.eff.org/
EFF’s Street-Level Surveillance project shines a light on the surveillance technologies that law enforcement agencies routinely deploy in our communities. These resources are designed for advocacy organizations, journalists, defense attorneys, policymakers, and members of the public who often are not getting the straight story from police representatives or the vendors marketing this equipment. Whether it’s phone-based location tracking, ubiquitous video recording, biometric data collection, or police access to people’s smart devices, law enforcement agencies follow closely behind their counterparts in the military and intelligence services in acquiring privacy-invasive technologies and getting access to consumer data. Just as analog surveillance historically has been used as a tool for oppression, we must understand the threat posed by emerging technologies to successfully defend civil liberties and civil rights in the digital age. The threats to privacy of these surveillance technologies are enormous, as law enforcement agencies at all levels of government use surveillance technologies to compile vast databases filled with our personal information or gain access to devices that can lay bare the intricacies of our daily lives. Use of these surveillance technologies can infringe on our constitutional rights, including to speak and associate freely under the First Amendment or be free from unlawful search and seizure under the Fourth Amendment. Law enforcement also tends to deploy surveillance technologies disproportionately against marginalized communities. These technologies are prone to abuse by rogue officers, and can be subject to error or vulnerability, causing damaging repercussions for those who interact with the criminal justice system. The EFF have a new tool to test your browser's potential for tracking. You can test your browser to see how well you are protected from tracking and fingerprinting (You want results similar to ours in the illustration above.)
TEST YOUR BROWSER How does tracking technology follow your trail around the web, even if you’ve taken protective measures? Cover Your Tracks shows you how trackers see your browser. It provides you with an overview of your browser’s most unique and identifying characteristics. Only anonymous data will be collected through this site. Want to learn more about tracking? When you visit a website, your browser makes a "request" for that site. In the background, advertising code and invisible trackers on that site might also cause your browser to make dozens or even hundreds of requests to other hidden third parties. Each request contains several pieces of information about your browser and about you, from your time zone to your browser settings to what versions of software you have installed. Some of this information is passed along by default simply to help you view the page. For example, HTTP headers are essential to most web functionality, and broadcast your device and browser version. But a lot of the information in your browser’s requests is also extracted by third-party ad networks, which have sneaky tracking mechanisms embedded across the Internet to gather your information. At first glance, the data points that third-party trackers collect may seem relatively mundane and disparate. But when compiled together, they can reveal a detailed behavioral profile of your online activity, from political affiliation to education level to income bracket. As long as this trove of data about you is linked back to you, your online activity can be logged. Ad networks primarily rely on two methods to maintain this link: cookie tracking, and browser fingerprinting. What are cookies?Cookies are small chunks of information that websites store in your browser. Their main use is to remember helpful things like your account login info, or what items were in your online shopping cart—in other words, they save your place. But they can also be misused to link all your visits, searches, and other activities on a site together. This use of cookies is a privacy violation, and browsers generally allow you to block, limit, or delete cookies. What is a digital fingerprint?A digital fingerprint is essentially a list of characteristics that are unique to a single user, their browser, and their particular hardware setup. This includes information the browser needs to send to access websites, like the location of the website the user is requesting. But it also includes a host of seemingly insignificant data (like screen resolution and installed fonts) gathered by tracking scripts. Tracking sites can stitch all the small pieces together to form a unique picture, or "fingerprint," of your device. What is the difference?Think of the small tracking devices scientists use to follow animal migration patterns, or a GPS transmitter attached to a car. As long as they’re attached to the target animal or vehicle, they are accurate and effective—but they lose all value if they’re knocked off or discarded. This is roughly how cookies behave: they track users up until the point a user deletes them. Fingerprinting uses more permanent identifiers such as hardware specifications and browser settings. This is equivalent to tracking a bird by its song or feather markings, or a car by its license plate, make, model, and color. In other words, metrics that are harder to change and impossible to delete. Can I do anything about this?!Completely blocking trackers is difficult, even with a fully-featured tracker blocker. Even so, we recommend using the tracking protections above. Privacy protection does not have to be perfect to make a big difference! There are two main dynamics that make trackers hard to entirely avoid online:
Cover Your Tracks’ primary goal is to help you determine your own balance between privacy and convenience. By giving you a summary of your overall protection and a list of characteristics that make up your digital fingerprint, you can see exactly how your browser appears to trackers, and how implementing different protection methods changes this visibility. The following suggestions are simple, straightforward protection methods, and are an excellent starting point. Simple suggestionsUsing a Tracker BlockerInstall a tracker blocker and watch your browsing experience get a lot more pleasant Most tracker blockers cross-reference massive lists of tracking scripts. They then block any attempts to load an ad or other item that matches. When you block trackers, you prevent tracking companies from reading your browser fingerprint. However, more advanced tracking techniques may still be able to gather information about you. Disabling JavascriptMost trackers run on JavaScript, and they can’t gather much of the information used to determine your browser fingerprint without it. Thus, your browser looks a lot less distinct, and is more protected. But there is a trade off. Disabling JavaScript breaks a staggering amount of websites, and limits the functionality of many more. Changing browser settings from defaultsTracking is so pervasive that all of the major browsers (Chrome, Firefox, and Safari) come with settings that disable certain types of tracking. Turning them on or off is as simple as going into the settings menu and clicking a button. Disabling tracking scripts in your browser settings is reliably effective, though not as robust as a designated tracker-blocker. For more info about what settings and protections your browser offers compared to others, check out this article from Blacklight. Using a fingerprint resistant browserSome newer browsers were built to thwart fingerprinting, such as Tor Browser and Brave. How they do this varies from browser to browser, but they generally work by making your fingerprint less unique and/or less consistent. This means trackers have a harder time following your usage of the web. Can my attempts to protect myself backfire? How can attempting to make myself more anonymous actually make me more identifiable?Each browser metric is highly connected to other metrics in complex ways. This is why we don’t recommend trying to change a single element of your fingerprint. Striving to get the most common result for any individual metric may seem like a good idea, but it can actually make your browser more identifiable. Let’s look at an example of how these metrics are interconnected: No matter what browser you’re using, they all send information about themselves to servers so that web content loads correctly. This information includes the browser name and version. If you swap out the identifier of the browser you're actually using with one from a more common browser, you may make yourself completely identifiable. How is this possible? If Chrome is a more common browser, how can identifying your browser as Chrome make you more unique? Because trackers aren’t only looking at what browser version you have. In combination with other metrics, your fake Chrome browser may stand out. This is because if you are actually using, say, Safari browser all the other metrics will point to this fact. You will have the only browser out there identifying itself as Chrome but looking like Safari. Incognito modeHistorically, Private Browsing and Incognito Mode had a single purpose. These modes were intended to prevent traces of sites you visited from being stored on your machine. It was not meant to prevent remote sites or trackers from identifying and storing when you visit a site on their servers. If you are using Firefox, using Private Browsing will provide some protections against trackers. Any trackers that are included in the Disconnect tracking protection list will be blocked. This keeps you safe from known trackers. Known fingerprinters and cryptominers which use your browser against you are also blocked. However, this will not prevent a new fingerprinter or tracker from identifying your browser and keeping tabs on it. In order to get this extra level of protection, your browser needs to have a fingerprint which is either:
This is an interesting one. Here are some quick facts to set the landscape:
But here is a website that will at least let you keep tabs on what's happening in this space. https://blackkite.com/data-breaches-caused-by-third-parties/ We like this for PII discovery.
https://github.com/redhuntlabs/Octopii WorkingOctopii uses Tesseract's Optical Character Recognition (OCR) and Keras' Convolutional Neural Networks (CNN) models to detect various forms of personal identifiable information that may be leaked on a publicly facing location. This is done in the following steps:
The accuracy of the scan can determined via the confidence scores in output. If all the mentioned conditions are met, a score of 100.0 is returned. To train the model, data can also be fed into the model_generator.py script, and the newly improved h5 file can be used. Usage
Exampleowais@artemis ~ $ python3 octopii.py pii_list Not a valid image format: pii_list/aadhaar/aadhaar-8.gif [ { "asset_type": "Bank", "confidence": 100.0, "file_name": "passbook", "extension": "jpeg", "path": "pii_list/bank/passbook.jpeg" }, { "asset_type": "Photo", "confidence": 99.98, "file_name": "IMG-20200331-WA0037", "extension": "jpg", "path": "pii_list/photos/IMG-20200331-WA0037.jpg" }, { "asset_type": "PAN", "confidence": 100.0, "file_name": "pan-7", "extension": "jpg", "path": "pii_list/pan/pan-7.jpg" }, { "asset_type": "Aadhaar", "confidence": 97.31, "file_name": "aadhaar-14", "extension": "jpg", "path": "pii_list/aadhaar/aadhaar-14.jpg" } ] We love this one. https://themarkup.org/blacklight Blacklight. A Real-Time Website Privacy InspectorBy Surya MattuWho is peeking over your shoulder while you work, watch videos, learn, explore, and shop on the internet? Enter the address of any website, and Blacklight will scan it and reveal the specific user-tracking technologies on the site—and who’s getting your data. You may be surprised at what you learn. Read about what they learned running this tool against different websites: https://themarkup.org/series/blacklight and how they built it..... Blacklight is a real-time website privacy inspector. The tool emulates how a user might be surveilled while browsing the web. Users type a URL into Blacklight, and it visits the requested website, scans for known types of privacy violations, and returns an instant privacy analysis of the inspected site. Blacklight works by visiting each website with a headless browser, running custom software built by The Markup. This software monitors which scripts on that website are potentially surveilling the user by performing seven different tests, each investigating a specific, known method of surveillance. The types of surveillance that Blacklight seeks to identify are:
Blacklight was built using the NodeJS Javascript environment, the Puppeteer Node library, which provides high-level control over a Chromium (open-source Chrome) browser. When a user enters a URL into Blacklight, the tool opens a headless web browser with a fresh profile and visits its homepage as well as an additional randomly selected page deeper inside the same website. Who’s peeking over your shoulder as you work, learn, or explore the internet?Try out Blacklight here. Enter a website, and Blacklight will scan it for user-tracking technologies — and who’s getting your data. Enter a URL for Blacklight to scanWhile the browser is visiting the website, it runs custom software in the background that monitors scripts and network requests to observe when and how user data is being collected. To monitor scripts, Blacklight modifies various fingerprintable properties of the browser’s Window API. This allows Blacklight to log which script made a particular function call, using the Stacktrace-js package. The network requests are collected using a monitoring tool included in Puppeteer’s API. Blacklight uses the script data and network requests to run the seven tests described above. Afterward, it closes the browser and generates an instant report for the user. It records a list of all the URLs that the inspected website requests. In addition, it makes a list of all domains and subdomains that were requested. The tool we provide to the public will not save those lists unless the user chooses to share results with us through an option in the tool. We define domain names using the Public Suffix + 1 method. We define first-party domain as any domain that matches the website visited, including subdomains. We define third-party as any domain that does not match the website visited. The tool compares the list of third-party domains from the website requests with DuckDuckGo’s Tracker Radar dataset. This data merge allows Blacklight to add the following information about the third-party domains found on the inspected site:
Blacklight runs tests based on the root URL of the page entered by a user into the tool. For example, if a user types in https://example.com/sports, Blacklight starts its inspection at https://example.com and disregards the /sports path. If a user types in https://sports.example.com, Blacklight starts its inspection at https://sports.example.com. Report Deeply and Fix Things Because it turns out moving fast and breaking things broke some super important things. Blacklight results for each requested domain are cached for 24 hours, and these cached reports are delivered in response to subsequent user requests for the same website during those 24 hours. This is designed to prevent the tool from being used maliciously to overwhelm a website with thousands of automated visits. Blacklight will also tell users whether their results are high, low, or about average compared with what the tool found on the 100,000 most popular websites as ranked by the Tranco List. This is described in more detail below. The Blacklight code base is open source and available on Github; it can also be downloaded as an NPM module. There are limitations to our analysis. Blacklight emulates a user visiting a website, but its automated behavior is different from human behavior, and that behavior may trigger different types of surveillance. For instance, an automated request might trigger more fraud detection but fewer ads. Given the dynamic nature of web-based technology, it is also possible that some of these tests will become out-of-date over time. And new legitimate-use cases for the techniques Blacklight flags could emerge that would not be listed in the tool’s caveats. For this reason, Blacklight results should not be taken as the final word on potential privacy violations by a given website. Rather, they should be treated as an initial automated inspection that requires further investigation before a definitive claim can be made. Previous WorkBlacklight is built on the foundation of various privacy census tools built over the past decade. It runs Javascript instrumentation, which enables it to monitor calls to the browsers’ Javascript API. This is based on OpenWPM, an open-source tool for web privacy measurement built by Steven Englehardt, Gunes Acar, Dillon Reisman, and Arvind Narayanan at Princeton University. It is now maintained by Mozilla. OpenWPM was used to power Princeton’s Web Transparency and Accountability Project, which monitored websites and services to discover companies’ data collection, data use, and deceptive practices. Through numerous studies conducted between 2015 and 2019, Princeton researchers uncovered the presence of many privacy-infringing technologies. These included browser fingerprinting and cookie syncing as well as how session replay scripts collect passwords and sensitive user data. One notable example is the exfiltration of prescription data and health-conditions data from walgreens.com. Five of the seven tests Blacklight runs are based on the techniques described in the Princeton research mentioned above. These tests are canvas fingerprinting, key logging, session recording, and third-party cookies. OpenWPM incorporates code and techniques from other privacy inspection tools, including FourthParty, Privacy Badger, and FP Detective:
Other projects that have influenced Blacklight’s development include the Web Privacy Census, conducted at UC Berkeley in 2012, and the Wall Street Journal’s “What They Know” series. How We Analyzed Each Type of TrackingThird-Party CookiesThird-party cookies are a small piece of data that tracking companies store in your web browser when you visit a website. This bit of text—usually a unique number or string of characters—identifies you when you visit other websites that contain tracking code from the same company. Third-party cookies are used by hundreds of companies to build dossiers about users and deliver customized ads based on their behavior. Popular web browsers Edge, Brave, Firefox, and Safari all block third-party tracking cookies by default, and Chrome has announced that it will phase them out. What Blacklight Tests Blacklight monitors network requests for the “Set-Cookie” header and observes all domains that set cookies using the document.cookie javascript property. Blacklight identifies third-party cookies as those whose domains do not match the domain of the website being visited. We look up these third-party domains in DuckDuckGo’s Tracker Radar data to find out who owns them, how prevalent they are, and what kinds of services they provide. Key LoggingKey logging is when a first or third party monitors the text that you type into a webpage before you hit the submit button. This technique has been used for a variety of purposes, including identifying anonymous web users by matching them to postal addresses and real names. There are other reasons for key logging, such as providing autocomplete functionality. Blacklight cannot determine the intent behind the inspected website’s use of this technique. What Blacklight Tests In order to test whether this is happening on a given website, Blacklight types predetermined text (see Appendix) in all input fields but never clicks on a submit button. It monitors network requests to see if the data that was entered was sent to any servers. Session RecordingSession recording is technology that allows a third party to monitor and record all of a user’s behavior on a webpage—including mouse movements, clicks, scrolling down the page, and anything you type into a form even if you don’t click submit. In a 2017 study, researchers at Princeton University found that session recorders were collecting sensitive information such as passwords and credit card numbers. When the researchers contacted the companies in question, most responded quickly and fixed the underlying cause of the data leak. However, the research highlights that these aren’t simply bugs but rather insecure practices that the researchers say should be stopped entirely. Most companies that offer session recording say they use the data to provide their customers—the websites installing the technology—meaningful insights on how to improve a user’s experience on the website. One company, Inspectlet, describes its service as watching “individual visitors use your site as if you’re looking over their shoulders.” (Inspectlet did not respond to an email seeking comment.) Credit:Inspectlet Caption: Screenshot from Inspectlet, a known session recording provider.What Blacklight Tests We define session recording as the loading of a specific type of script by a company that we know to be providing session recording services. Blacklight monitors the network requests for specific URL substrings that appear only when session recording is taking place, according to a list created by researchers at Princeton University in 2017. Report Deeply and Fix ThingsBecause it turns out moving fast and breaking things broke some super important things. Sometimes key logging is used as part of session recording. In those cases, Blacklight would correctly report the session recorder as both key logging and session recording because we observed both, even though both tests are identifying the same script. Blacklight accurately detects when a website loads these scripts—but companies typically record only a sample of website visits, so not every user is being recorded on every visit. Canvas FingerprintingFingerprinting describes a group of techniques that try to identify your browser without setting a cookie. They can identify you even if you block all cookies. Canvas fingerprinting is a type of fingerprinting that identifies users by drawing shapes and text on a user’s webpage and noting the minor differences in the way they are rendered. Caption: Four examples of canvas fingerprinting found with Blacklight.These differences in font rendering, smoothing, and anti-aliasing and other features are used by marketers and others to identify individual devices. All of the major internet browsers, except Chrome, try to counter canvas fingerprinting—either by not fulfilling data requests for scripts known to have engaged in the practice or by trying to standardize users’ fingerprints. The image below is an example of the type of canvas images used by fingerprinting scripts. These canvases are usually invisible to the user. What Blacklight Tests We follow the methodology described in this paper by researchers at Princeton University to identify when the HTML canvas element is used for tracking purposes. The parameters we use to identify canvases that are being drawn for fingerprinting purposes are:
Ad TrackersAd trackers are technologies that identify and collect information about users. These technologies usually (but not always) appear with some level of consent from the website owners. They are used to collect website user analytics, for ad-targeting, and by data brokers and other information collectors to build user profiles. They usually take the form of Javascript scripts or web beacons. Web beacons are small 1px by 1px images that are placed on a website for tracking purposes by third parties. Using this technique, a third party can determine behaviors including when a particular user went to a site, the kind of browser, and what IP address it used. What Blacklight Tests Blacklight checks all network requests against the EasyPrivacy list, which contains URLs and URL substrings that are known to be used for tracking. Blacklight monitors the network activity for requests being made to these URLs and substrings. Blacklight only records requests being made to third-party domains. It ignores any URL patterns in the EasyPrivacy list that match a first-party domain. For example, the EFF hosts its own analytics, and that results in requests to “https://anon-stats.eff.org,” their analytics subdomain. If a user types in https://eff.org, Blacklight does not consider calls to https://anon-stats.eff.org to be a third-party request. We look up these third-party domains in DuckDuckGo’s Tracker Radar data set to find out who owns them, how prevalent they are, and what kinds of services they provide. We only include third-party domains that belong to the “Ad Motivated Tracking” categories defined in the Tracker Radar data set. Facebook PixelThe Facebook pixel is a piece of code Facebook created that allows other websites to target their visitors later with ads on Facebook. Common actions that can be tracked by pixel include viewing a page or specific content, adding payment information, or making a purchase. What Blacklight Tests Blacklight looks for network requests from the site going to Facebook and looks in the URL query parameters for data that matches the schema of what is described in the documentation for Facebook’s pixel. We look for three different types of data: “standard events,” “custom events” and “advanced matching.” Google Analytics’ “Remarketing Audiences”Google Analytics is the most popular website analytics platform in use today. According to whotracks.me 41.7 percent of web traffic is analyzed by Google Analytics. While most of the functionality of this service is to provide developers and website owners with information on how their audience is engaging with their website, the tool also allows the website to make custom audience lists based on user behavior and then target ads to those visitors across the internet using Google Ads and Display & Video 360. Blacklight examines inspected sites for the presence of the tool, not how it is used. What Blacklight Tests Blacklight looks for network requests from the inspected site going to a URL beginning with “stats.g.doubleclick” that also contains the “UA-” Google account identifier prefix. This is described in more detail in Google Analytics developer documentation. SurveyTo determine the prevalence of tracking technologies on the internet both for context in Blacklight and for accompanying news stories, we ran the 100,000 most popular websites as defined by the Tranco List through Blacklight. The data and analysis code can be found on Github . Blacklight successfully captured data for 81,617 of those URLs. The rest either failed to resolve, timed out on multiple attempts, or didn’t load a webpage. The percentages listed below are for the 81,617 successful captures. Some of the analysis goes beyond what appears on the tool. The key findings from our survey are as follows:
LimitationsBlacklight’s analysis is limited by four main factors:
Regarding false positives, when Blacklight visits a site, that site can see the request is coming from computers hosted by Amazon’s AWS cloud infrastructure. Because botnets are often run on cloud infrastructure, our tool could trigger bot-detection software on the website, including canvas fingerprinting. This could result in false positives for the canvas fingerprinting test where the purpose of the test is not to track users but rather to detect botnets. In order to test this, we took a random sample of 1,000 sites from the top websites from the Tranco List that we had already run through Blacklight on AWS. We ran this sample through Blacklight software on our computer locally at a residential IP address in New York City. We concluded that the results of a Blacklight inspection locally are very similar, but not exactly the same, as running it on cloud infrastructure. ↩︎ linkResults for Sample: Local Computer and AWSLocal AWS Canvas fingerprinting8%10% Session recording18%19% Key logging4%6% Median number of third-party cookies45 Median number of third-party trackers recorded78Not all surveillance activity that is imperceptible to the user is necessarily malicious. For instance, canvas fingerprinting is used for fraud prevention because it can identify a device. And key logging can be used to provide autocomplete functionality. Blacklight does not attempt to identify the intent of any particular tracking technology it finds. Nor can Blacklight determine exactly how a website uses the data it collects on a user when loading session recording scripts and monitoring user behavior, such as mouse movements and keystrokes. Blacklight does not check the terms of use or privacy policies of the websites it visits to see whether they disclose their surveillance activities. ↩︎ linkAppendixInput field values The table below lists the values we programmed Blacklight to type into input fields on websites. We used the Mozilla autocomplete attribute write-up as our reference. Blacklight also checks for the base64, md5, sha256 and sha512 versions of these values. Autocomplete Attribute Blacklight Value Date01/01/2026 [email protected] PasswordSUPERS3CR3T_PASSWORD SearchTheMarkup TextIdaaaaTarbell URLhttps://themarkup.org OrganizationThe Markup Organization TitleNon-profit newsroom Current PasswordS3CR3T_CURRENT_PASSWORD New PasswordS3CR3T_NEW_PASSWORD Usernameidaaaa_tarbell Family NameTarbell Given NameIdaaaa NameIdaaaaTarbell Street AddressPO Box #1103 Address Line 1PO Box #1103 Postal Code10159 CC-NameIDAAAATARBELL CC-Given-NameIDAAAA CC-Family-NameTARBELL CC-Number4479846060020724 CC-Exp01/2026 CC-TypeVisa Transaction Amount13371337 ↩︎ linkAcknowledgementsWe thank Gunes Acar (KU Leven), Steven Englehardt (Mozilla), and Arvind Narayanan and Jonathan Mayer (Princeton, CITP) for comments and suggestions on an earlier draft. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
January 2024
Categories |