log files question

They have: 71 posts

Joined: Mar 2002

Hello.........

I'm learning to analyze log files (using Sawmill).......

There are quite a few entries that come up in the page hits analysis like this:

/mypage1.htm? (default page)
/mypage6.htm? (default page)

I also show hits on
/mypage1.htm
/mypage6.htm

without the ? and (default page) comment. Also, on my site I use various ads and have tracking URLs like this:
/mypage15.htm?source=google/

But on none of the pages such as mypage1.htm or mypage6.htm have I appended tracking URLS to them. Does anyone know where these pages with the ? appended come from? I don't know if they represent legitimate hits.......or if they should just be discarded

Thanks!

mjames's picture

They have: 2,064 posts

Joined: Dec 1999

They are legitimate hits ... it doesn't matter to the browser if you visit google.com or google.com? -- although you can use that to track incoming hits such as google.com?twf. I wouldn't worry about it, Joe.

They have: 9 posts

Joined: Jan 2003

I am using Webalizer 2.1 and Modlogon 0.5.7 to view stats. Both are very good programs. How do I get both of them to find out which countries my users are coming from. Both tell me "unresolved". My site is on Unix.

Krish Lalu

They have: 9 posts

Joined: Jan 2003

Total Hits 159516
????Total Files 133190
????Total Pages 5191
????Total Visits 1279
Total KBytes 1853520
????Total Unique Sites 2579
????Total Unique URLs 5090
Total Unique Referrers 164
????Total Unique User Agents 526

Can someone help me by explaining what those terms with question marks in front of it means.

What's the difference between a file, page and a hit, unique sites and url's, referrers and agents.

Please help explaining

Krish Lalu

They have: 71 posts

Joined: Mar 2002

Hello Krish.............

That information should be with your log file documentation. It is, I think, fairly uniform in interpretation, but there could be definition variances between programs. Here's what my documentation says:

Hits. Hits are accepted log entries. So if there are 5000 entries in your log file, and there are no log filters, and all the entries are valid (i.e. none of them have corrupt dates), then Sawmill will report 5000 hits for the file. If there are log filters that reject certain log entries, then those will not appear as hits. Log entries that are accepted, either using "accept as hits" or using "accept as page view" will count toward the hits totals. Because there are no default filters that reject, you will generally have nearly as many reported hits as you have log entries. You can view and edit the log filters by Opening your configuration from the Administrative Menu, clicking Configuration Options, and then clicking the Log Filters tab. See also Using Log Filters.

Page views. Page views correspond to hits on pages. For instance, if you're analyzing a web log, and a hit on index.html is followed by 100 hits on 100 images, style sheets, and JavaScript files, that appear in that page, then it will count as a single page view--the secondary files do not add to the total. This is implemented in the log filters--page views are defined as log entries that are accepted by a log filter "as page views." Log entries that are accepted by the filters, but are accepted "as hits" rather than "as page views" do not contribute to the page views total. Therefore, you have complete control over which files are "real" page views and which are not--if Sawmill's default filters do not capture your preferred definition of page views, you can edit them until they do. By default, page views are all hits that are not GIFs, JPEGs, PNGs, CCSs, or JSs. See Hits, above, for more information on log filters.

Visitors. Visitors correspond roughly to the total number of people who visited the site. If a single person visits the site and looks at 100 pages, that will count as 100 page views, but only one visitor. By default, Sawmill defines visitors to be "unique hosts"--a hit is assumed to come from a different visitor if it comes from a different hostname. This can be inaccurate due to the effects of web caches and proxies. Some servers can track visitors using cookies, and if your web logs contain this information, Sawmill can use it instead of hostnames--just change the log filter that copies the hostname field to the visitor id field, so it copies the cookie field instead.

Bandwidth. Bandwidth is the total number of bytes transferred. It is available only in log formats that track bytes transferred. Bandwidth is tracked for every log entry that is accepted, whether it is accepted "as a hit" or "as a page view".

Sessions. Several of Sawmill's views deal with "session" information, including the "sessions overview" and the "paths through the site" view. Sessions are similar to visitors, except that they can "time out." When a visitor visits the site, and then leaves, and comes back later, it will count as two sessions, even though it's only one visitor. To reduce the effect of caches that look like very long sessions, Sawmill also discards sessions longer than a specified time. The timeout interval is also customizable.

Joe Hussar

They have: 9 posts

Joined: Jan 2003

Joe, thanks for the info. I did some further research and still have a few questions. I have 45 thumbnails on my home page, when the homepage loads, does each of those thumbnails count as a hit. What number would an advertiser be interested in more, page views or hits.

The Webmistress's picture

She has: 5,586 posts

Joined: Feb 2001

Yes, each thumbnail would count as a hit. Advertisers would be more interested in page views as this is the number of times the page is loaded as a whole.

Julia - if life was meant to be easy Michael Angelo would have painted the floor....

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.