Skip to main content

Pandora SRE interview

phone

you know we are trying to get a system admin.
yes. a sre or something.
self-hosted, supermacro, debian
10k hosts
ansible

- explain what happens when you power up a server.
post --> boot --> services

- dba says database server is slow.
what do you mean by 'slow'?
it could be from the client side sees slow.
it could be the server it self is slow, or the network between is slow.
such as a sudden spike of network requests.
check system slow.
db itself slow. e.g. mysql slow log.
check which client is causing it slow. see what query it is executing.
maybe an expensive query. ask developer to optimize the query.

since it is db, check if disk io is slow.

- what's the difference between a disk slow and a disk hardware failure?
check raid controller interface. showing error or something.
check dmesg log.

- what does the system load mean?
the length of the task queue.

- what's the wait status in top?
a process is blocked by io, like disk or network.

- how do you tell if it's blocked by what?
maybe check disk io by iostat? then network by iftop or nload?

- all system memory used up. is that a bad thing?
no

- when will it be bad?
a lot of swap. because it is accessing disk. vm using disk as memory.

- /var/db/log/xxx.log is full. i deleted it, but df still shows space not freed up.
because the file is not closed yet. restart the program for it to close the file.

- what is the other way to release w/o restarting process?
`lsof | grep deleted` to get pid.
echo > /proc/pid/fd/fd_number

- you are given a task to build infrastructure. boot up 100 hosts.
we want all users to be able to log in. how do you do that?

after host provisioned, run some script to call some configuration system such as puppet or ansible.
puppet will install ssh keys for the users specified.

- what is the other way? how about a centralized authentication system.
AD.
- Have you configured LDAP?
not much.

- how do you check if server is configured to LDAP?
use my own AD to test?
check /home/user directory?
- which file to check if it's configured ldap in linux?
/etc/nsswitch.conf

- what's your programming lang?
python, bash

- a bash question:
i have a bash program which creates a lot of /tmp files while running.
i would like my program to clean up /tmp if anyone kills it?

catch the kill signal.
trap

- what would you like to focus if you join the sys admin team?
depends. last job i mainly do provisioning and configuration management.
this job is more like an embedded devops. i'm more focusing on monitoring.
so depends on the team and company needs.

= introduction to techops and sysadmin team.
- sys admin
- site ops
- system plan (jira. etc.)

sa team has 15 people.
various skill, but each is a generalist, can do things like file system functionality.

10k bare metals.
building an internal cloud based on docker, using Hashicorp Normand.
and a ci/cd pipeline to help that cloud.

configuration ansible backed by a git repo.
pull based??
push ansible

= where are your DCs located?
3 dc in bay area
1 in chicago
2 in virginia

= what's your oncall schedule like?
27/7 pager
on call primary 1 week every 2 months.
secondary 1 week.

= is there some team in india or china to help?
no. 24 hours.
we have done much to put out fire, so not much midnight pages.
unless some developer fills up the disk by debuging logs.
overall oncall is not that taxing.

==========================================
What to expect for your interview: We typically do 3-4 whiteboard interviews
and your discussions could include whiteboard sessions around any and all of the following:
Python development topics, configuration management, log processing, infrastructure, general problem solving,
be prepared to walk us through your most recent project,
talking about code design/architecture decisions, functionality and design.
The non-technical portion of the interview will be with a Manager and a
non-engineer. People often ask about a dress code;
Business Casual is fine for your interview.


--------------------------------------
onsite

1.

dave talks a lot.
daniel some questions.

daniel lives at ???
gf lives at sf?
colma station is for ghosts? shopping center.

2.

jeff from msu. 1995.
brett question about disk full.

jeff: if a file is deleted. but you have a process opening the file. how do you recover it.
lsof to find the fd. once have fd, you can copy the file or do whatever with it.

jeff currently working on ELK stack. Commercial version licence bought from elstic.co.
latest stack. what issues i met during installation. version mismatch.

3.

elk currently no buffer. using logstash queue and syslog buffers.

4.

greg likes bike touring.
once did from seatle to dc! 8-10 weeks.

dave does gardening and beer brewing.
   
chris originally from atlanta. been here 10 years?

5.

Wrap up

6.

Walk out

Comments

Popular posts from this blog

CKA Simulator Kubernetes 1.22

  https://killer.sh Pre Setup Once you've gained access to your terminal it might be wise to spend ~1 minute to setup your environment. You could set these: alias k = kubectl                         # will already be pre-configured export do = "--dry-run=client -o yaml"     # k get pod x $do export now = "--force --grace-period 0"   # k delete pod x $now Vim To make vim use 2 spaces for a tab edit ~/.vimrc to contain: set tabstop=2 set expandtab set shiftwidth=2 More setup suggestions are in the tips section .     Question 1 | Contexts Task weight: 1%   You have access to multiple clusters from your main terminal through kubectl contexts. Write all those context names into /opt/course/1/contexts . Next write a command to display the current context into /opt/course/1/context_default_kubectl.sh , the command should use kubectl . Finally write a second command doing the same thing into ...

OWASP Top 10 Threats and Mitigations Exam - Single Select

Last updated 4 Aug 11 Course Title: OWASP Top 10 Threats and Mitigation Exam Questions - Single Select 1) Which of the following consequences is most likely to occur due to an injection attack? Spoofing Cross-site request forgery Denial of service   Correct Insecure direct object references 2) Your application is created using a language that does not support a clear distinction between code and data. Which vulnerability is most likely to occur in your application? Injection   Correct Insecure direct object references Failure to restrict URL access Insufficient transport layer protection 3) Which of the following scenarios is most likely to cause an injection attack? Unvalidated input is embedded in an instruction stream.   Correct Unvalidated input can be distinguished from valid instructions. A Web application does not validate a client’s access to a resource. A Web action performs an operation on behalf of the user without checkin...