Master plan 2018

Problem: “People do not understand the security implications from error messages.”

Research aim: “Standardize error messages to ensure better understanding of security implications.”

  • Ease of situation for security tools developers: They don't have to formulate error strings, documentation or explanations – just take/link the standard.
  • App developers (=end-users) interpret the error in alignment with the opinions of security community (rigorously tested!).
  • App developers (=end-users) will have a trusted resource with error description (as documentation is currently too short, they google and read Stack Overflow, which may be a problem).
  • App developers (=end-users) have common experience across different products and OS (thus transitions are easier).
  • Standardization was successful elsewhere, think of POSIX errno.
    • Side note: Errno was a part of big API standardization.
  • Be rigorous in testing how people interpret the messages. Consider both knowledgeable developers and security-noobs.
  1. Motivation: Common developers (not security-trained) do not always get the severity of errors (let's take X.509 certificate validation as an example). From BUT KRY and MU PV079 experiments we get (for example) the following:
    • Certificate/server name mismatch is considered less of a problem than validity expiration.
    • Wildcards in names are not noticed.
    • Constraints violation errors are not understood.
    • <do>Get exact cases and numbers from the data (necessary to establish base for improvement).</do>
  2. PA193 experiment
    • Map the understanding of problems. (Do security students know the implications/consequences of various X.509 errors?)
    • Teach students that explaining security risks concisely is difficult (by letting them do it).
    • Gather initial formulations/explanations of problems, problem severity ranking.
  3. DevConf 2018
    • Design initial set of error messages and descriptions based on PA193 experiment results, do initial testing of our proposed solution.
    • Give people OpenSSL and certificates to validate (so that they see the errors in the “natural environment”)
    • Participants in two groups: Control (standard OpenSSL errors, documentation & Internet to search) and New (OpenSSL wrapper with our errors linking to our descriptions)
    • Let people rank the severity and describe the problem consequences
  4. Developer discussion
    • Discuss with security tools developers the severity of particular certificate validation problems
    • Establish “correct solutions” for the experiment cases
  5. Mechanical turk
    • Just an idea: Assess comprehension of error descriptions/consequences by the general public?
  6. Confirmatory study
    • Just an idea: Re-test the refined system (possibly Red Hat interns in autumn?)
  7. MUNI + VUT + CMU homework assignment?
    • possibly during autumn semester

Multiple experiments contribute to the goal.

Experiment on master cryptography course students, spring/autumn 2017

  • Write a program to validate certificate on 100 domains.
  • Assign trust rating to these domains according to problems they exhibit

Experiment on master secure programming course students, November 2017

<do>How long is the homework open? How many points to award?</do>

Seminar part: Understanding problem severity

  • Intro into usable security, why error messages and documentation matters
  • Task: rank/evaluate given X.509 certificate problems severity
    • Use existing messages: Descriptions from OpenSSL documentation (or GnuTLS)
    • Offer the option “I don't understand the problem”
    • Do not allow using the Internet
    • Extended scale from previous experiment
    • Aim: Establish the level of intuitive understanding of current error messages
    • Also collects necessary demographics data (previous experience!)
  • Task 2: Google/discuss what may be causing the problem and try to reevaluate.
    • What might be the causes? How often does it happen? What may be the consequences?
    • Which resources did they use? (web browser history!)
    • Did they discuss this with friends?
    • Did they talk to the teacher?

Homework (part 1): Error message and explanation writing

  • Assign 2 problems (error states) to each student and ask for documentation conveying security implications (to write their own version)
  • For each error ask for:
    1. Name (1-3 words) for error code name
    2. Type: error or warning
    3. Short description for CLI (1 sentence)
    4. Explanation/description (one or more paragprahs, possibly pictures)
      • Who is the target audience? Can we assume any knowledge?
    5. Possible reasons of the problem (?)
    6. Possible consequences of trust (?)

Homework (part 2): Voting on best formulations

  • Anonymize student submissions, publish internally.
  • Let them vote on best formulations (and describe why they are good).
  • Give bonus points to those with best formulations (but voting is mandatory for standard credit).

Technicalities

  • 1 week, Thu 23:59 soft deadline, each started subsequent day -1.5p
  • 5 points + 1 point bonus
  • Task should be at the end of seminar slides

Experiment on developers at a conference, January 2018

  • Task 1: Try to validate these X certificates using OpenSSL.
    • <do>Why do we have this part? Maybe dump.</do>
    • I'm not sure there is a point in having them (not) find out how the verification is done in OpenSSL CLI – we have done this previously and I'm not sure there can be any further interesting observations. If dropped, they do not have to google/read mans, e.g. we give them the highlighted part of the manual that is relevant to what they need to do (and they run the commands nearly right away instead of spending 15 valuable minutes here…).
  • Task 2: Assign trust assessment.
    • Split from Task 1 to be able to measure time googling what is the problem
  • Task 3: Present “solution” (what was the “problem” with each certificate).
    • Ask if they found that out (= self-reported success measure)
    • If they have not, allow time and possibility of trust reassignment
      • Thus part 3 is independent from success in part 1+2
  • Participants in 2 groups:
    • Control: real OpenSSL + Internet
    • New: OpenSSL wrapper with our error messages + page with descriptions (+ Internet)
  • Certificates: 5 error cases + 1 OK + 1 OK with “problem” that is not a problem (so that we do not imply that different solution from “OK” should have trust of 2+, see part 2)
  • What to measure?
    • Trust scores, participant background info
    • Resources used (browsing history, man reading)
    • Task time
  • Participants of the DevConf 2017 experiment?
    • Give out RSA paper pre-print (if possible)
    • Ask them to call friends
    • Possibly allow to do part 2
  • Collect contacts for security professionals (for discussion/questionnaire later on?)
    • Motivate by briefing of the experiment goal and structure
    • Do this part personally (while others are administering standard tasks)
  • People motivation and booth
    • Get some cool stuff from Red Hat again
    • Maybe have a poster explaining usable security towards developers? (Based on talk slides?)
    • Invite people to the talk talk, mention there will be video
    • Usable security jokes hanging on the spot
    • Don't forget English FI MU banner
  • Talk with security tools developers to determine “opinion of the security community”
    • Basically have them also rank the certificate problems
    • Add qualitative interview – what misinterpretation do they see as worst?
  • Who to contact?
    • Ask the Red Hat security group (around Nikos), including Kai Engels (NSS)
    • Email attendees of the OpenSSL Lunch & Learn
    • Email selected participants from the DevConf 2017 experiment
    • Email prospective participants of the DevConf 2018 experiment
    • Email Rich Salz from OpenSSL (possibly he can ask interested colleagues?)
  • These are not developers, but “general public” (!)
  • Only a rough idea now: Test understandability of error descriptions?
    • Different target population than developers, still might give interesting insights – Do we really know developers have any special knowledge?
  • Amazon Mechanical turk enables to give tasks to “general population” (mostly US)
    • Not precisely proportional to US population, but not very misleading (confirmatory research exists)
    • Routinely used for social sciences studies, sometimes also for usable security studies (towards end-users)
    • Workes are paid, from $0.01 to about $1.00 per task
    • Possibility to limit people taking the task (education, income, demographics, …)
  • <do>Check service availability, qualifications offered.</do>
  • <do>Check task strucure (join and try :-})</do>
  • Another rough idea: Test created system on Red Hat interns?
    • These are the at-risk developers (not particularly educated but contributing to real-world software)
  • Where to submit?
    • SOUPS would be great, but submission deadline is mid March 😕.
    • IEEE S&P (early November), USENIX Security (mid February) may be too ambitious (and with inconvenient submission dates).
    • USEC – submissions mid December and Core C 😕.
    • EuroUSEC – again too small without Core rank.
    • STAST (Socio-Technical Aspects in Security and Trust) is only a workshp without Core rank.
    • RSA seems reasonable again (we'll see what happenes this year).
    • EuroS&P is in spring, submission deadline in mid-August (might be a nice choice!).
    • Consider submitting to a non-academic conference without proceedings? (e.g. BlackHat)
  • Motivation
    • BUT KRY / MU PV079 experiments (security risk misinterpretation)
    • Problem: Not much written in official documentation (~1 sentence) and no further trusted resource or standard exists
  • Proposition
    • Goal: Standardize error messages to ensure better understanding of security implications.
    • Advantages: See top of the page
  • Procedure
    1. Confirm the existance of a problem and determine its size (BUT KRY / MU PV079 / MU PA193)
      • Byproduct: OpenSSL API evaluation
    2. Propose a solution, do initial testing (MU PA193, DevConf 2018)
    3. Discussion with developers (GnuTLS / OpenSSL / NSS devs?)
    4. Refining and confirmatory testing (?? – MTurk? Red Hat devs?)
  • Wanted results (mostly DevConf 2018)
    • Old errors + the Interner vs. new errors + single description page
      • significant difference in security risk interpretation (towards the opinion of the security community)
      • less time needed to understand the situation
      • less pages/resources used to understand the situation
    • Resources
      • Which of our descriptions needed additional Google search? What was missing?
      • Which resources on the Iternet did the participants use to find out it this is a problem?
    • Stats for OpenSSL interface
      • How many people were able to do the particular check?
      • Check if this is relevant (depends on chose cases)
      • Programmable interface (PV079, KRY) vs. command line interface (DevConf)
        • Only if DevConf uses the same/similar cases to KRY/PV079
    • Knowledgeable users vs. noobs
      • Do results for these differ with standard errors + Internet?
      • Wanted: Differences between these are smalled when using our errors (i.e. our system is usable for both knowledgeable users and noobs)
    • Subjective attitudes
      • Developers have “positive emotions/opinions” about the new system
      • What do they think of the old/new situation?
  • Miscellaneous
    • Let's Encrypt certificate should be less trusted than standard CA? (source)
  • Priming: Think about its effects and decide whether to do or not do it (especially for DevConf). Mention in paper.
  • Ethics:
    • PA193: Probaby no personal information collected. Inform about research usage, no informed consent necessary.
    • DevConf: Informed consent necessary (screens, possibly audio, keystrokes?). Follow agreed project structure.
    • MTurk: Probably no consent necessary (no personal information collected).
  • Use the newest Fedora with the newest OpenSSL (maybe even current dev master?)
  • Register a domain for error descriptions (x509errors.org? certificateerrors.com? certerrors.eu?)

Research tagline: “Make error messages humnan-readable again.”

  • Less typical thigs may be needed for the users to read/google (most people know expired).
  • Cases: Assume correct format of the certificate (validation errors only)
  • Cases: Can be similar to KRY/PV079 to better comparison
  • Cases: TODO
  • Trust scale:
    • 1. If this certificate was presented by my bank's website, I would be happy with the security level and log in not worried at all.
    • 2.
    • 3. I quite trust the page. I would happily use my library account to log in. However, if this was the website of my bank, I would not be very enthusiastic about its security.
    • 4.
    • 5. This looks suspicious. I will read the page, but I will not fill in any information.
    • 6.
    • 7. The webpage is outright untrustworthy. I do not consider it safe to browse and do not trust any information on it.
  • The POSIX standardization was API standardization; the errno standardization was an integral part of it. As we are talking of completely different APIs, standardization of error codes may be a harder task than it seems, if at all possible. I'd expect more background/investigation included, on whether that's possible at all, if restricted only to error code standardization. Have you considered to standardization of API for validation?
    • That is a very good point I didn't completely realize till now. For now, I don't consider any API standardization. Different libraries have different conventions and consistency within the library itself is (in my view) more important than having a part of the API consistent with other libraries. Furthermore, error codes/messages can be easily extended to other programming languages, but API standardization is not.
      • If I was a review, I'd need more than that to be convinced 🙂 In fact this argument that standardization of API may not be possible due to API differences, seem to be in contrast with what you mentioned before, that error codes may also not be applicable on every library. That you don't need to answer here, it is just food for thought for your research.
  • What worries me when reading it is that the planned tasks are for openssl only, while you aim standardization for a larger set of libraries; do you assume that any findings over the openssl investigations are transferable to other APIs libraries?
    • I'm basing them on OpenSSL, because it's much better to use a real world example/application than constructing an artificial one. And OpenSSL is (by far?) the most used.
      • Note that my question was not on why openssl, but how do you plan to apply the results that you get from the openssl study, to other libraries. As an engineer I'd expect that a study on a particular component applies on that specific one, and cannot be generalized, unless there is some good abstraction based on their similarities, (or multiple studies confirming the results). That's again food for thought.