Richard's Diary

Tuesday, December 23, 2008

Networking Basics To provision internet access a client needs:
  1. an ip address so data knows where to go
    ifconfig to see ip and subnet mask
  2. a subnet mask to know what hosts will receive its broadcast message on the wire, and therefore which won't, so the message can be sent to the gateway
    Prior to CIDR, the net was classful
  3. gateway (which could also be the router and firewall, all combined in one)
    netstat -rn to see gateway
  4. a name server so client knows where to send requests
    more /etc/resolv.conf to see nameserver

Network Hardware

NAT (proxy, ip/port forwarding) setup
Domain History as Revealed by Webmaster Tools Timeline
  1. Right before Thanksgiving, renewed domain for 10 years.
  2. Over Thanksgiving, pages started showing up in index
  3. In early December, noticed that pages showed up in index after an hour after posting (this is not necessarily a sign of short latency; could be timing coincidence); cache takes a little longer to show up in index (seems like it is 8 hours)
  4. In mid December, saw first external links from two pages with PR 3. One had a nofollow and the other didn't. It will be interesting to track the PR of the linked to page
    -UPDATE: one of the external links was removed

  1. Report shows that most pages are US_ASCII even though they are UTF8 in the html source
  2. Unable to remove navbar-iframe

Browser dev tools
  1. in ~/.mozilla/firefox/oa3go87g.default/chrome/userContent.css add the following to highlight nofollow
    a[rel~="nofollow"] {border: thin dashed firebrick ! important;background-color: rgb(255, 200, 200) ! important;}
  2. install pagerank extension, developer toolbar and livehttp headers
  3. to see noindex, a searchmonkey script needs to be installed

Saturday, December 20, 2008


In Sander's excellent book, he recommends putting together a list of interfaces (public/private), service/protocol, direction (inbound:input/outbound:output). Then it becomes a simple matter of translating it to rules.

For a server, for each service, you'll usually have to configure at least 2 rules, one for each direction. Recommended services are [command/protocol] ping/ICMP, nslookup/UDP, wget/HTTP, SSH. Then one needs to add rules to prevent DoS attacks and add logging.

Next steps:
  1. Make sure iptables is running
    1. Leverage /sbin/SuseFirewall2 to have it run at boot on a brand new machine
    2. "chkconfig iptables on" to start iptables at boot but how long is system exposed before iptables is on?
  2. "iptables -L --verbose" shows current configuration or "iptables-save > tempfile" dumps to configuration file format (confirm format with "service iptables save"
  3. Define rules in that file
  4. Populate rules into service
    1. manually at command line or
    2. with script or
    3. with iptables-restore < /etc/sysconfig/iptables this eliminates need to flush iptables with iptables -F
  5. Then permanently save it with "service iptables save"

Random notes:
  1. service -s lists the current status
  2. chkconfig -l lists the configuration (ie whether something is started at boot or not)
  3. Based on iptables syntax, it seems that command line options are designated with one dash. Then options within that option are designated with two dashes so that the nesting becomes clear.
  4. lsmod shows loaded modules
  5. TCP Handshake: First SEQ is randomly chosen, then sequentially numbered. ACK tells which sequence should be sent next. This is symmetric between initiator and receiver because after initial handshake, it's a "conversation".

Thursday, December 18, 2008

Runlevel Service Configuration Spent the past few days playing with sudo and runlevels. Some gotchas
  1. in opensuse, disabling the default all sudo, results in a bug in yast where it says the root password is invalid when it really is.
  2. don't mess with deleting default users and groups, unless you're willing to learn them all
    1. For instance, haldaemon, messagebus and avahi run processes needed by the system. But all of these had an ! in the shadow file, which means that the account is disabled or a * which means that no password will match so user can't login. So while it seemed like this accounts were not being used, they are.
    2. if one simply configures the firewall properly, then nothing can come in and nothing can go out
  3. in opensuse and ubuntu (and most distros i suspect), you need to use the preinstalled boot init scripts.
    1. I tried creating my own using postgresql's contrib/start-scripts but it lacked the header so it didn't work even though i set up the symlinks correctly.
    2. Regardless of why that happened, package installers are prevalent, and since they do wipe out existing symlinks (supposedly) whenever a package is installed, it's best to just use the distro template init scripts and do "chkconfig --level 35 postgresql on"
    3. If you download an RPM from the distro repository, the package will have customized the template init script for you.
  4. Create guests that use NAT networking (as opposed to bridge networking) to take advantage of host connectivity to internet
  1. begun my shell scripting training with sed, ways to run scripts
  2. use init 3 or init 5 to switch between those run levels
  3. list installed packages, install package and show package detail

Distro notes
  1. CentOS seems to use really old packages.
  2. Didn't like Fedora GNOME - especially the software updating part. Couldn't tell if it was Fedora or GNOME that had these annoying messages saying this was waiting for that. But Fedora is Red Hat Linux and there is tons of documentation for that.
  3. Don't like KDE widgets
  4. Ubuntu takes some getting used to if you don't supply root with a password

Sunday, December 14, 2008

High Availablity Followup for the Future To enable high availability (warm standby), just keeping sucking up wal logs from master
  1. create directory mkdir /var/log/postgresqlwal/
  2. chown postgres /var/log/postgresqlwal
    #you can set archive_timeout to force the server to switch to a new WAL segment file at least that often
  3. Begin recovery on the standby server from the local WAL archive, using a recovery.conf that specifies a restore_command that WAITS.
    "The magic that makes the two loosely coupled servers work together is simply a restore_command used on the standby that WAITS for the next WAL file to become available from the primary. The restore_command is specified in the recovery.conf file on the standby server. Normal recovery processing would request a file from the WAL archive, reporting failure if the file was unavailable. For standby processing it is normal for the next file to be unavailable, so we must be patient and wait for it to appear. A waiting restore_command can be written as a custom script that loops after polling for the existence of the next WAL file. There must also be some way to trigger failover, which should interrupt the restore_command , break the loop and return a file-not-found error to the standby server. This ends recovery and the standby will then come up as a normal server."
-log shipping requires dropping cluster data directory with each import if you want to be able to read from warm standby (Master-Slave allows the slaves to be read)
-log shipping does not help replicate across versions (pgdump is great for replicating across versions if you can get over the file size limit)
-a cold standby is just a backup waiting to get the latest WAL logs (archived + unarchived) so it can be restored (can you playback WAL logs that haven't been archived? I guess by definition you can read WAL logs that have been archived, even if they are after the STOP point of your backup?)
-you could have a backup that is just used for reads to do datawarehousing
-hot standby as a term does not make sense when you have multi master because the other masters are the hot "standbys"
-asynchronous multimaster sounds like a disaster synching up different master dbs because the writes are not done in order (because the writes were done on different masters)

Data partitioning
Haven't found a need to import postgres logs into db so don't need csv format yet
Recovery worked! file:///usr/share/doc/packages/postgresql/html/continuous-archiving.html#BACKUP-PITR-RECOVERY
Followed instructions noting "Normally, recovery will proceed through all available WAL segments, thereby restoring the database to the current point in time (or as close as we can get given the available WAL segments). So a normal recovery will end with a "file not found" message, the exact text of the error message depending upon your choice of restore_command . You may also see an error message at the start of recovery for a file named something like 00000001.history"

2008-12-14 17:03:18 PST LOG: database system was interrupted; last known up at 2008-12-13 21:54:27 PST
2008-12-14 17:03:18 PST LOG: starting archive recovery
2008-12-14 17:03:18 PST LOG: restore_command = 'cp /var/log/postgresqlwal/%f %p'
2008-12-14 17:03:18 PST LOG: log_restartpoints = true
cp: cannot stat `/var/log/postgresqlwal/00000001.history': No such file or directory
2008-12-14 17:03:18 PST LOG: restored log file "000000010000000000000000.008A9E00.backup" from archive
2008-12-14 17:03:18 PST LOG: restored log file "000000010000000000000000" from archive
2008-12-14 17:03:18 PST LOG: automatic recovery in progress
2008-12-14 17:03:19 PST LOG: redo starts at 0/8A9E48
cp: cannot stat `/var/log/postgresqlwal/000000010000000000000001': No such file or directory
2008-12-14 17:03:19 PST LOG: record with zero length at 0/10000B0
2008-12-14 17:03:19 PST LOG: redo done at 0/1000068
2008-12-14 17:03:19 PST LOG: last completed transaction was at log time 2008-12-13 21:49:49.597163-08
cp: cannot stat `/var/log/postgresqlwal/000000010000000000000001': No such file or directory
cp: cannot stat `/var/log/postgresqlwal/00000002.history': No such file or directory
2008-12-14 17:03:19 PST LOG: selected new timeline ID: 2
cp: cannot stat `/var/log/postgresqlwal/00000001.history': No such file or directory
2008-12-14 17:03:19 PST LOG: archive recovery complete
2008-12-14 17:03:19 PST LOG: checkpoint starting: shutdown immediate
2008-12-14 17:03:19 PST LOG: checkpoint complete: wrote 3 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.00
1 s, total=0.006 s
2008-12-14 17:03:19 PST LOG: autovacuum launcher started
2008-12-14 17:03:19 PST LOG: database system is ready to accept connections

I checked that phrase I added during backup showed up in recovery and that I was able to add a new phrase, which means that normal database operations resumed perfectly after the restore.
Restoring a PostgreSQL Backup If you look at pg_log you'll see that the first WAL file was recycled (since it was archived) to save disk space. That is, the 16MB file no longer exists but it's former presence is noted as a new file called "WALfile.startlocation.backup" that is just a few hundred bytes. The archive_status subdirectory shows the same filename.done and 0 bytes. The space is given to another WAL log. Because checkpoint_segments was lowered, the system keeps a few WAL logs (preallocated space) handy in anticipation of increased space usage.

As soon as the backup was done, a checkpoint was created to close out the old WAL log and start a new WAL log so that the backup DB and associated archived WAL logs are as fresh as possible.

Sample recovery file in /usr/share/postgresql/recovery.conf.sample
restore_command = 'cp /var/log/postgresqlwal/%f "%p"'
Use log_restartpoints=true
WAL segments that cannot be found in the archive will be sought in pg_xlog/ ; this allows use of recent un-archived segments. However segments that are available from the archive will be used in preference to files in pg_xlog/ .
The server will go into recovery mode and then commence normal database operations.

I thought that would mean the WAL logs would be written to and corrupt them, which is why you want to make a backup of them BEFORE beginning recovery. Not true, just read about timelines: "Whenever an archive recovery is completed, a new timeline is created to identify the series of WAL records generated after that recovery. The timeline ID number is part of WAL segment file names, and so a new timeline does not overwrite the WAL data generated by previous timelines.")

Saturday, December 13, 2008

PostgreSQL Configuration and Backup After reading chapter 18 and most of the manual, I enabled the following configuration in postgresql.conf
  1. checkpoint_segments = 1
  2. archive_mode = on
  3. archive_command = 'cp -i %p /var/log/postgresqlwal/%f </dev/null'
  4. log_min_duration_statement = 250ms
  5. log_checkpoints = on
  6. log_connections = on
  7. log_disconnections = on
  8. log_duration = off
  9. log_hostname = off
  10. log_lock_waits = on
  11. log_autovacuum_min_duration = 250ms
Then I followed file:///usr/share/doc/packages/postgresql/html/continuous-archiving.html#BACKUP-BASE-BACKUP and made a successful backup and saw the corresponding WAL get written to /var/log/postgresqlwal/ along with the backup history file.

Backup command was
tar -cvzf dec1308postgresbackup.tar * --exclude=pg_xlog

Tomorrow I will verify the integrity of the backup to see if the WAL recorded a phrase I added after starting the backup.

After doing the backup I understand this now "The archive command is only invoked on completed WAL segments. Hence, if your server generates only little WAL traffic (or has slack periods where it does so), there could be a long delay between the completion of a transaction and its safe recording in archive storage."

I suppose the last WAL file could be copied manually but if it could then there wouldn't be a warning that there could be data loss. ie Chapter 25 does not say that warm standbys will never lose data.

Friday, December 12, 2008

More PostgreSQL Notes Database organization hierarchy:
  1. server
  2. cluster (made up of physical tablespaces)
  3. database
  4. schema
  5. object (table, function, index, views, rules, triggers)
To drop a database and create fresh one
  1. psql -c "drop database pickyricky_production" -U pickyricky -W -d pickyricky_test
  2. psql -c "CREATE DATABASE pickyricky_production" -U pickyricky -W -d pickyricky_test

To create and restore backup, but this is not as favorable in production as warm standbys
#file:///usr/share/doc/packages/postgresql/html/app-psql.html (-1 says to do it in a transaction)
  1. pg_dump -o -U pickyricky pickyricky_development > testdumpfile
  2. psql -1 -U pickyricky pickyricky_production <>
  3. vacuumdb -z -U pickyricky -W -d pickyricky_production
  4. Speed hints #file:///usr/share/doc/packages/postgresql/html/populate.html
    1. disable archive_mode while restoring since it's writing out a journal which you presumably already have since you're restoring a dump (ie, get the journal from the server that created the dump)
    2. Set appropriate (i.e., larger than normal) values for maintenance_work_mem and checkpoint_segments


Keep in mind the following operational structure and hence all the available configuration options in postgresql.conf when PostgreSQL writes to WAL log and database files
  1. synchronycity between the WAL and database files
  2. delay (caching) of writing of to hard disk
    1. OS cache (handled by postgres)
    2. hard drive controller cache (get a controller with a backup battery)
    3. hard drive cache (hdparm command)
  3. enabling log archiving
  4. optimizing log archiving
    1. checkpoints
    2. recycling

Monday, December 8, 2008

Negatives clarification in Adwords Negative keyword should not be confused with negative match. The former says don't include expansions that include the negative token. (This explains why there needs to be at least one positive word in the seed keywords.) The latter returns phrases that other advertisers/searchers designated as irrelevant to the seed term.
  1. Volume count of negative match phrases depends on the seed_keyword. Confirmed that different seeds yield different counts for the same negative word.
  2. Additional negative to consider counts are actually the same as the broad match positive word counts and therefore useless.
  3. The total count for negative match is also 200 words
  4. per
    there is no negative keywordtype in the API

Sunday, December 7, 2008

First Use of Production Adwords API I wanted to compare the API to the Web GUI so I made a call to both and compared the results. The only difference seemed to be the range of the competition scale and the GUI allowed you to make single negative keyword calls.
  2. use synonyms and include adult content
  3. computer yielded results which i downloaded the csv of and stored in computer-broad-synonyms-adult-keywords.csv and computer-broad-adult-keywords.csv
  4. [computer] yielded results which i downloaded the csv of and stored in computer-exact-synonyms-adult-keywords.csv and computer-exact-adult-keywords.csv
  5. When i did the API call a.obtain("computer","true","Broad")
  6. I got back the broad results, the SOAP request and responses were saved in workspace and I consumed 20 api units
  7. the synonyms came back in section called additionalToConsider; expansions came back in morespecific section. both were contained in a single XML response file
  8. total combined morespecific and additionaltoconsider words returned was 200 from both the GUI and API (confirm)
You can't trust the language designation in the XML because 40 of the 200 words were non-english according to the language element but every one of them was based on csv inspection (confirm). Maybe it's more accurate if you do non computer words?

This begs the question why they even come back since i specify a language and country??
-more SOAP_6750_getKeywordVariations_response.xml | grep -ic <language>
-more SOAP_6750_getKeywordVariations_response.xml | grep -ic <language>en

Advertisercompetition comes back on a scale of 1 to 5 (5 being the highest); this maps to the web gui of 0 to 1 (1 being the highest)

I confirmed in the GUI that turning synonyms on or off gives you a different set of words (neither a pure superset or subset). In fact, turning synonyms off resulted in less than 200 words total returned (confirm)

  1. need to confirm that using exact works in API

Thursday, December 4, 2008

Virtualization Dave walked me through VMWare's offerings:
  1. Enterprise editions which are not free
  2. Free editions like Server which can create a 'guest', which is 'hosted' on 'physical' machines
  3. Free editions like Workstation and Player which can, respectively, connect to a remote guest to play on/off and just play a 'disk' (ie, guest)
When I installed Fedora 10 32bit on:
  1. Linux - the installation script uses gcc 4.3 but kernel is compiled with 4.3.1. It had errors and was slow.
  2. Windows - no errors but it was also slow. Maybe it's the 256MB RAM provisioned with each VM
When I installed Ubuntu 8.1 Desktop 64bit on Windows, it was slow also but this time I waited and I discovered the following:
  1. loading the ISO is just always slow
  2. loading the VM disk for the first time is always slow
  3. afterwards, playing the VM is about as fast as a physical machine
  4. you need at least a 3G if not 4G virtual disk and for some reason the Server configured Ubuntu with 512MB of RAM
It would be nice to have isos on USB keys so that a VM could be booted from the CD with the file defined to be one of those isos. Then one could copy the VM disk ('appliance') onto other USB keys so that you could essentially carry around different operating systems and play them as long as a local machine had a player installed.

You would have a third USB key or USB hard drive (all USB storage should be FAT32 formatted so that they can be read and written to by Windows or Linux) to store your files

© 2010 Picky Ricky, Inc. originalblog