Wednesday, October 16, 2013

Parsing big log file

Parsing big log file

1. Task

Task: I have big text file (which have size approx 2 GiB) and want to parse it. I want to find each entry of word "exception" and then print this line and next 9 lines. Also, this 10 lines can contain string:
 "stuff at begin of line At procedure '%PROCEDURE_NAME%' many many symbols in end of line" where words "At procedure" exists in the text constantly and %PROCEDURE_NAME% is text which I also want to find (without quotes if possible :) ).

So, let's try to solve this task :)

Of course, I can write program in C++, but I do not like one-time programs and want to write script. For parsing text files (especially big files) I prefer AWK. However, I used AWK about a year ago :)

So, Google helped me and I wrote something like this:

2. Solution

BEGIN {
  needDump  = 0;
  startLine = 0;

  startDts = systime();
  print "WORK STARTED AT: " strftime("%Y-%m-%d %H:%M:%S", startDts);
}

$0 ~ /exception/ {
  needDump = 1;
  startLine = NR;
}

needDump == 1 {
  if( (NR - startLine) < 10)  {
    print NR " : " $0;
  } else {
    print "\n\n";
  needDump = true;
  }
}

(needDump == 1) && ($0 ~ /At procedure/) {
  a = $0;
  idx1 = index(a, "'");
  b = substr(a, idx1 + 1);

  idx2 = index(b, "'");
  b = substr(b, 1, idx2-1);

  print "\tPROBLEM PROCEDURE\t---" b "---";
}

END {
  now = systime();
  print "WORK FINISHED AT: "  strftime("%Y-%m-%d %H:%M:%S", now);

  spentSeconds = now - startDts;
  print "Spent " spentSeconds " seconds";
}

3. Description

This program will do everything we specified in "task" section and also will print work time. Everything will be printed to the console.

When we want to use this program you can write:

awk -f parse_exceptions.awk syslog-2013-10-06.log > exceptions.log

where:
  • parse_exceptions.awk  - file with our script;
  • syslog-2013-10-06.log  - input log file;
  • exceptions.log - output file with messages.

Spent time for 1.4 GiB file is 39 seconds (Core i7 @ 2.2 GHz, 8 GiB RAM). I believe this result is very good for this task :)

If you like this script and want to test awk you can find it here (GNU AWK for Windows).

Hope it was useful.

No comments:

Post a Comment