Thursday, April 17, 2008

Bridging the Gap Part II

Yesterday I wrote about getting windows to send its logs to linux. Today, I am going to do something with them. If you remember, my original intent was to track when my computer was turned on and off in order to generate a time sheet.

Before we can dive into the coding, we need to play with windows some more first. By default, windows doesn't like to log a lot of stuff....Probably so when it breaks you have to call MS and pay their techs to tell you. So first we need to enable logging of all login/log out events. This is actually quite simple and is done through the group policy management in windows XP or server. This is found in:

Start-> administrative tools -> Local Security Policy -> Local Policies -> Audit Policy

Change whatever you want in here. The big ones I changed were 'account logon events' and 'logon events'. I changed these to success and failure so later on I can write some code to email me whenever someone tries to log in but fails.

That part was simple... now on to the code. I chose to use php, simply because I like how it interacts with sendmail and I didn't fell like messing with perl. This script needs to do a few things. First it needs to open the log file, figure out when the computer was turned on/off, calculate how many hours that was, then generate a report and email it to me. It sounds MUCH more complex then it really is. Lets begin.

Task #1: Reading in the data.

This should be a trivial task for anyone handy with programming, but here's the block anyway:

//Read in the log file
$filename = "/var/log/windows.log";
$fh = fopen($filename,'r');
$data = fread($fh, filesize($filename));
fclose($fh);



Now that our entire log file is in the array $data, we can start to process it. I should mention one caveat here. I have syslog-ng setup to roll the log files after 2 weeks. This is important as it limits the size of the file I'm reading. Without this addition, and after several months of operation, this script would start to bog down as it would be processing huge files, mostly full of useless info. So please....roll your log files.

When it comes to determining when I logged in and then back out, I could do some fancy regex to search for lines matching key words like 'login' or 'logout' or do other complex conditional matching.....but I'm lazy. After all this whole project is an exercise in laziness. So all I did was say that the time of the first recorded log event of a day is the time the computer turned on, and the last event is the time the computer turned off. This is a MUCH easier task.

At this point I do need to deviate. Yes we have data and yes it's ready to process, but there is something else that we need. Remember, this program is designed to make a time sheet. I happen to get paid bi-monthly, so I need to show only 2 weeks worth of info. This task is surprisingly difficult.

I approached this problem from an application point of view. I know this script will only run every other monday, so I have control over how the script chooses dates without the script choosing dates....I simply have it give me the previous two weeks worth of times. I did make a second version that is smart enough to figure out where the current day falls relative to the work week, but that is just some tricky date math...I'll leave it up to the reader to figure it out.

So lets begin this by breaking the logs up by day. Fortunately the logs are time stamped by the month and day, so it's easy to pull all the records for one particular day with a simple regex. I'm going to post some code here then discuss it:

//initialize data arrays
$hours = array();
$dates = array();
$days = array();
$in = array();
$out = array();


//Today's Date
$day = date("U");


for($i=16;$i>2;$i--){
$ts = $day-($i*86400);
$pre = date("M d",$ts);
$dates[] = $pre;
$days[] = date("D",$ts);
preg_match_all("/$pre.*/", $data, $results);

$first = $results[0][0];
$last = count($results[0])-1;
$last = $results[0][$last];

preg_match("/(\d\d):(\d\d):\d\d/", $first, $ontime);
preg_match("/(\d\d):(\d\d):\d\d/", $last, $offtime);


$ondectime = $ontime[1]+timeRound($ontime[2],'in');
$offdectime = $offtime[1]+timeRound($offtime[2],'out');

$in[] = $ontime[1].':'.$ontime[2];
$out[] = $offtime[1].':'.$offtime[2];


$hours[] = $offdectime-$ondectime;
}


This is actually the bulk of the code...minus the sub routines. The first thing it does is figure out todays date and store it in $day. This is in the form of a unix time stamp, which has the units of seconds since the 'common epoch'. This is important, as if we add or subtract 86400 (# seconds in a day), we can march up and down the calendar with no regard for month boundaries. That's actually exactly what we do.

The for loop begins with the first day of the pay period, which happens to be 16 days prior. I just played around with this index until it worked, so if you don't want your week to run from Saturday to Friday, just experiment until it works. The first thing the loop does it compute the time stamp for the 'current' day, then turn that into a month, day format identical to the time stamp in the log file. See where this is going???? Well before we get there, we store that value as the first value in an array and then store which day of the week it is in another array. This is for later when we build the report.

The magic here occurs with the preg_match_all function. We are using the regex "/$pre.*/" to grab all lines from the log file that begin with the time stamp we are after. From this result, we save the first line, then count the number of total lines and use that to extract the final line.

$first = $results[0][0];
$last = count($results[0])-1;
$last = $results[0][$last];


The next step is to use regex again to search those two lines for their time stamp. The regex I use strips the hours and minutes all in one expression.

Now that we know what time I turned the machine on, and what time I turned it off, all we have to do is figure out the time difference and make our report. EASY! Well kind of.

We are going to have to do some rounding here and covert the minutes to decimal hours. This could be as simple as round($min/60), but I need to be able to round to the nearest 1/2 hour and I wanted control over how the rounding worked. For instance if I'm in the office 10 min past the half, I charge for the whole half hour...I needed a rounding function to do that...so I wrote one.

function timeRound($min,$type){

if($type = 'in'){

if($min < rnd =" 0;} if($min >= 15){
if($min <>
$rnd = 0.5;}
if($min >= 45){
$rnd = 1;}
}
}else{

if($min < rnd =" 0;} if($min <>
$rnd = 0.5;}
if($min >= 40){
$rnd = 1;}
}

} //end main if



return $rnd;

}



This is simple rounding logic that lets me change how my clock in and out times are rounded based on if I'm clocking in or clocking out. You could easily skip this step or simplify it a great deal, but I didn't. Live with it.

If you notice, the function returns the rounded minutes in decimal hours, so all I have to do is add that to the hours and subtract my decimal clock out time from my decimal clock in time and I get my total hours worked!! We also reconstruct the in and out times into a comma delimited string for the report.

That is one iteration of the loop. This repeats, each time advancing one day until it runs out of days. Simple 'eh.

Now we have 5 arrays full of data ready to be formated and emailed. I decided I wanted two blocks (one per week) where each block consisted of a header row showing the date, then the day of the week, followed by my in time, then out time then total hours. So in short I'm making 2 blocks each consisting of 5 rows. It just so happens that there are 5 of these rows. 5 arrays, 5 rows...coincidence? I think not.

So now it's a simple matter of writing out arrays in order....10 times. Did someone say subroutine? (I did).

I approached this in two subroutines....one that makes the whole block for a week, and one to print a line of text. We'll look at generating the block first:

function printBlock ($low,$high){

global $dates, $days, $in, $out, $hours;

$summary = '< table align="center" border="0" width="600">'."\n";
$summary .= printLine($days,$low,$high,'Day')."\n";
$summary .= printLine($dates,$low,$high,'Date')."\n";
$summary .= printLine($in,$low,$high,'In')
."\n";
$summary .= printLine($out,$low,$high,'Out')
."\n";
$summary .= printLine($hours,$low,$high,'Hours')
."\n";
$summary .= '
</table >'
."\n";

return $summary;
}


Is there a more elegant way to do this....probably. Do I care? no. This function just takes in the index range it will be printing and gets the data by calling our data arrays as globals. I can't see reusing this sub in a different program, so globals are okay. If you are quick, you'll notice that I'm formatting in html. I hate html, but it's the only way I could get outlook to display my reports with the number aligned. Every text editor know to man knows how to handle the '/t' character, but not outlook. So I begin the block with my table declaration. Then it's a simple matter of printing out the rows, line by line followed by the table ending block.

Notice that I tack on a "/n" to each line. This is to make sure a single line does not get too long. If a single line exceeds some very large number, it gets truncated to !. Unfortunately you only see this in outlook when you get missing data and random !'s. It's also good practice for debugging html source in a browser. So just do it.

The second routine I mentioned actually generates the individual lines. From the syntax above, we can see it takes 4 args. It takes the array it's printing, the beginning index, the ending index and a label. Looking at the subroutine, we see it's just a for loop (suprise suprise).


function printLine($array,$low,$high, $label){
$line = "< tr >\n< td >$label:< td >\n";
for($i=$low;$i<$high;$i++){
$line .= "< td > < div align="\"%gt $array[$i] < /div > \n";
}
$line .= '< /tr >';
return $line;
}


All we do is add the tags, print the label, then encase each element of the array (between $low and $high) in tags and some
tags to center it. Nice thing about this approach is that you can format to your heart's content and only have to alter one set of tags...yay recursive!

Now we have all the pieces. We can print our two blocks of text (to a string) and then email that sucker away. The remaining code looks like this:

$subject = "Time Sheet Report for $days[0], $dates[0] through $days[13], $dates[13]";

$message = printBlock(0,7);
$message .= "
&ltbr&gt";
$message .= printBlock(7,14);



$to = "your_face@.net";
$from = "your_mom@.net";
$headers = 'MIME-Version: 1.0' . "\r\n";
$headers .= 'Content-type: text/html; charset=iso-8859-1' . "\r\n";
$headers .= "From: $from";

mail($to,$subject,$message,$headers);


There are a few things to notice here. First off we separate the two calls to printBlock with a "br". This is to space them out in the email message. Secondly, we can't just use "From:" in our headers....we have to declare that we are using HTML and not text. Make sure you add the "\r\n" to the ends of the header lines. Remember we are sending this to a windows machine, which is just a glorified type writer. It needs both the carriage return and line feed characters or it doesn't know how to act.

That's all there is to it. It took my longer to write this blog then to actually write the code. One caveat I'm still working on is the fact that windows does not log hibernation events. So for this to work 100% you will need to log off or shutdown when you go home.

Wednesday, April 16, 2008

Bridging the Gap... extending windows with linux

The other day I was scrambling to fill out my time sheet, thinking back to what hours I had really worked, when I realized there had to be an easier way. I end up scripting every other repetitive task, so why not this too?

That got me thinking....When I am working, I am on my computer...so I should be able to determine when I came into work and when I left work based on when the computer was turned on and when it was shut down. Easy! Well, no. It would be if my desktop was a linux machine that logged every event known to man, but alas, it is a windows machine that doesn't log anything by default and stores it's logs in a format inaccessible by conventional scripts. Even if it was...could you imagine trying to write that batch file. Ugh. Perl would be an option, but I digress.

So in my hunt, I stumbled on an elegant solution that solves this issue and opens up a door to a ton of flexibility. There are a handfull of simple programs that install as a window's service that will repeat the windows event log to...wait for it.... a syslog server. That's right, imagine the possibilities.

This whole process will involve 3 main steps. First we need to get windows talking to the syslog server, then get the syslog server to process the logs, followed by parsing the logs and building my time sheet.

Step 1. Making windows talk to unix.
This was actually one of the easiest steps. The program I'm using is simply called Eventlog to Syslog and is available from this link:

https://engineering.purdue.edu/ECN/Resources/Documents/UNIX/evtsys/

Just download the binaries and extract to your %SystemRoot%/System32 directory as indicated on the site. Once this is done, just open a command line, CD to the %SystemRoot%/System32 directory and install the service:

evtsys -i -h sylogserver -f local7
net start evtsys

of course, replace the hostname or IP of your syslog server where it says sysloserver. I used the -f local7 flag to tell the service to send the messages with the local7 facility. You could use whatever you wanted, but this worked best for me.

2. Setting up the syslog server


Now that we have the windows machine sending logs over to the syslog server, we have to get the syslog server to do something with them. If you already have a working syslog server...good for you...your job is done. If not, then keep reading.

I am using an Ubuntu 7.10 system for this, which comes with sysklogd by default. Instead of fighting with it, I opted to install syslog-ng instead, which is much more powerfull and flexible. Being a debian system, this is accomplished with the following command:

sudo apt-get install syslog-ng

one note, I tend to do most of my work like this in a root shell, so from here on out I'm skipping the sudo's.

Once installed, configuration seems hard, but really isn't. Syslog-ng makes use of a configuration file. Within this file, there are three main blocks: source, filter and destination. The source block defines how information is coming into the syslog daemon. The filter block tells syslog-ng what data to work with and the destination block tells syslog where to put the data. Combine all three, and you pull in, process and write out data to a log file. Simple huh?

So after installing syslog-ng just open /etc/syslog-ng/syslog-ng.conf in your favorite text editor. If you're a real man, you use vi..... I use vim. For now, you can leave all the options alone... if you want to mess with them, go read the man page first.

The first thing we need to do is tell syslog-ng to listen over the network for our incoming logs. This is done by adding the following line to your "sources" list:

source remote { udp(); };

You can name the source whatever you want, but since it's pulling in remote logs, I figure 'remote' is a fitting name. The udp(); option tells the daemon to listen over the default upd syslog port.

Next we need to define a filter. It may not seem necessary, but it is. Don't skip this. Filters let us, well, filter the incoming stream. It would be useful if we had 4 or 5 machines reporting in, as we could send logs from different IP's to different files, or we could pull out events from specific processes or times of day or whatever. For this example, we are just looking at the log facility, which if you remember, we set to 'local7' in the last step. So our filter line looks like:

filter win ( facility(local7); };

Don't really think I need to explain that one further.

The final block we need to define is the destination block. This is which log file to write the events to. It actually doesn't have to be a file, but could be a pipe or a console, but we are going to keep it simple and just log to a file. So this is what I have for my destination:

destination windows { file("/var/log/windows.log"); };

complicated.... I know.

So now we have the three parts defined...it's time to add the fourth. I know I said 3, but I lied. The fourth block just combines the parts we just defined into a single block defining the flow. So syslog-ng knows to pull in events over udp, grab those with local7 facility and write to the windows.log file. This is performed with a 'log' block like so:

log {
source(remote);
filter(win);
destination(windows);
};

Simple..... Now with a quick restart of syslog-ng (/etc/init.d/syslog-ng restart), it should be up and running. To test it, just run:

tail -f /var/log/windows.log

then open up your windows services and start and stop a few trivial ones. You should see the event display on your linux console.

That's all there is to it. Now you can play with your windows logs like you do your linux logs.

Tomorrow I'll continue on this theme with how I used php to generate my time sheets