/xxx.html vs /xxx/index.html

message from Tripecac on 21 Jul 2004
You know this topic, boring, frustrating, inescapable.

Let's say you (okay, I) have a static website. So no php, no mysql, no
nuttin' 'cept html. Anyway, let's get even more specific (for the sake of a
concrete example) and say it's a music website. You're building it from
scratch.

And let's say you want a page that lists all the albums for a band, and then a
separate page for each album.

Now, consider that albums page. Where's it gonna go? I can think of two main
choices: you can throw it in your root directory like this:

/albums.html

...or you can create an "albums" directory and have the albums page be the
default page for that directory:

/albums/index.html

What are the pros and cons of these approaches? Well, that depends on the
goals of your site, right?

A big goal is to avoid having to move pages (and thus urls) around later. So
if you pick /albums.html now, you need to be pretty confident that you're not
gonna want to move to /albums/index.html. We need to try not to break any
links!

Also, we should try to make it easy to add navigation to each page. Remember,
this is a static site, so much of our "what nav goes where" thinking is gonna
be influenced by the directory structure.

If the albums page (or the "album index") is gonna have the same navigation as
the individual albums' pages, then it seeeeeeeems to make sense to put the
albums page in the same directory, so: /albums/index.html. That way, all .html
files in /albums/ can have exactly the same html for their nav bars and
headers, since their relative path is the same (e.g., ../images/banner.gif will
work for all of them)

Also, with mouse-gestures becoming more common, it's good to have an
index.html (or some other default page) in every directory; this ensures that
if someone "hacks" your url (e.g., moves up one path level), they'll find
something there, and not a "you don't have permission to access this directory"
error or something else).

Okay, now how about downsides of using /albums/index.html?

Well, for one thing, you'll end up with a bunch of files named "index.html".
This sometimes makes is harder to know which file you are editing (if all you
see in your editor's title bar is "index.html").

Also, there's an increased change in your accidentally uploading an index.html
to the wrong path, thus overwriting some other index.html, which'll screw up
your website until you notice your mistake.

Also, it's hard to know when *not* to use the /xxx/index.html solution. See,
if you use it in one instance (for your albums page), what's to stop you from
using it for *all* pages? Why not have every single page in your website be an
index.html in its own directory?

See, either you do /xxx/index.html for *all* pages, or you'll have to apply
judgement about when to use it. And that's the hard part. Judgement. You
know, making decisions. Which is what this post is about. Good grammar, huh?

So think about this for a sec... It's a bit of a "paradox" or a "brain
teaser" or whatever... Here goes...

Remember those individual album pages I mentioned? Well, one of the main
reasons for using /albums/index.html was to get the albums page at the same
directory level as the individual album pages, so that they could all share the
same navigation elements, right? Okay, so that *seems* like a good idea...
Until you have to apply the same thinking to the individual album pages...

You need to decide whether to give each album page its own directory. If you
decide you like the /xxx/index.html approach as a rule of thumb, then you will
feel tempted/pressured to put the individual album pages in their own
directories:

/albums/album1/index.html
/albums/album2/index.html
/albums/album3/index.html

etc.

If you do this, you might make it easy to keep all the files for each album
organized, *but* you lose the advantage of having the individual albums at the
same directory depth as the albums page. You can no longer use the same HTML
for navigation on all album-related pages, since the relative urls are
different for individual albums (e.g., ../../images/banner.gif) and the main
albums page (e.g., ../images/banner.gif).

So... you are back where you started, which is having to decide WHEN to do
xxx.html and when to do /xxx/index.html

Any thoughts on how to make this sort of decision? Any rules-of-thumb? Any
articles? Books? Movies? (heh heh) Anything?
 
Joe Makowiec replied to Tripecac on 21 Jul 2004
<snip /> interesting discussion for the sake of space

The W3C, at their validator http://validator.w3.org/ , pops in little
tips from time to time. One of them that caught my attention was "Cool
URIs don't change", and it went on a bit. Using their customized
google site search engine with that as the search string results in:

http://tinyurl.com/3pdad

which translates to:

http://www.google.com/custom?hl=en&ie=UTF-8&cof=AWFID:0b9847e42caf283e%
3BL:http://www.w3.org/Icons/w3c_home%3BLH:48%3BLW:72%3BBGC:white%
3BT:black%3BLC:%23000099%3BVLC:%23660066%3BALC:%23ff3300%3BAH:left%
3B&domains=www.w3.org&sitesearch=www.w3.org&q=cool+uris+don%
27t+change&spell=1

They have several articles on the topic. (Apparently you aren't the
only one who has thought about this!) Anyway, come right down to it,
the magic words you're looking for are 'content negotiation', where the
web server can be convinced to throw, say, album1.html or album1.php or
album1.monkeybutt when the site viewer requests
www.mysite.invalid/album1 So if you've done it right, your links will
never change, even though you move from a static site to a php-driven
one. A bit on content negotiation straight from Apache:

http://httpd.apache.org/docs/content-negotiation.html

A fast google for [content negotiation] will result in more pages if
you're interested in pursuing it.

Does anybody know if IIS does content negotiation? I'm only familiar
with Apache doing it.
 
Tripecac replied to Joe Makowiec on 21 Jul 2004
I love the idea of omitting the "index.html" or "index.php" in the url.
http://www.mysite.com/mytopic/ would be a very flexible way to "nail down" urls
for years to come.

However, there's a big downside: offline viewing.

I build my sites on my PC and then upload to my server. There's no staging
server other than my PC, which currently isn't running any server software.

When we preview our pages offline, we can't use urls like ../mytopic/ because
our PCs don't know which default file to use (nothing tells XP to use
"index.html"). If we go to that URL, we'll end up browsing the mytopic
directory, which isn't good.

Also, what if we want to put our website on a CD? Packaging all those mp3s,
lyrics, and notes on a CD within an easy-to-use interface (the web site) could
make a nice gift, and is of course a natural (and useful) way to backup the
site.

If we use the /xxx/yyy/ url approach, we rule out offline (serverless)
previewing and easy porting to CD, because links to directories won't work.

If we *must* use the /xxx/yyy/ approach, it seems like the only solution is to
install a web server on our dev box. I have Apache installed on mine, but I
don't run it all the time 'cause I'm worried that it might compromise security
on my PC. Also, I use the PC for more than just web development, and am not
sure I want a web server bogging it down all the time. If I am playing a game
or recording mp3s, do I really want Apache running in the background?

As for putting a web server on a CD, I know it's possible, but it seems like a
lot of work, and you'd have to install a server for each possible OS (XP, 98,
Linux, Mac, etc.) just to make the site work on a CD.

Are there any other ways around it? How about some javascript that
automatically adds "index.html" to all relative links which lack an explicit
filename? That javascript could be invoked in each page's body tag *if* there
is no web server running. I'm not sure how to detect whether a web server is
running via javascript. Also, what if someone disabled javascript and then
tries to run the site from a CD?
 
darrel replied to Tripecac on 21 Jul 2004
In an ideal world, you'd set up your folder in the way that it makes the
most sense for the person maintaining the site, and then you'd set up the
URLs in the best way that makes the most sense for the person visiting the
site. This is typically done through URL rewriting using ModRewrite rules in
Apache or ISAPI filters in IIS.

Otherwise, I wouldn't waste too much sleep on the issue. ;o)

-Darrel
 
darrel replied to darrel on 21 Jul 2004
er...LOOSE too much sleep. You can't waste sleep. ;o)
 
DaveBlues replied to darrel on 21 Jul 2004
Tripecac

I think the most important factor for your consideration on this is
"scalability".

In 2002 I took over a website for a large NYC real estate firm.
It was a frames based site and It had @ 25 folders and one root index.html
file.
In each folder was the ubiquitous index.html, nav.html, body.html.

As I started to re-design, revise and convert from static to dynamic, I
dropped the frames and folders structure.
I put the 25 or so former "index.html" pages into the root and named them for
what they were (say album1.html, album2.html, etc.). It was nice, clean and
easy.

Overall we had maybe 100 html pages.

Until the following month management gave us a two more project additions to
our site -
1. show all of Manhattan's neighborhoods including each of their related
schools and public transportation,
2. create an individual page translated into each of the diverse languages
spoken by our more than 200 sales agants.

300 pages later our root folder was starting to bloat and we were trying to
remember if "uwstr1.html" linked via the map on "uws.html" and/or "uestr1.html"

We now have over 1,700 html & php pages and 17,000 total files. Our website
weighs in at 10gb to date, with photos and floorplans and photo galleries
uploading daily at a rate of @ 15mb.

I've spent the past three weeks trying just to DECIDE how to reorganize the
site just to make working on it less of a find Waldo for my assistant and I.

I wish I had never deleted those individual folders! And the task of
reorganizing is nothing compared to the huge strings of browser "redirect"
code I will have to write into my Apache configuration file so bookmarked pages
will still work.

Don't think because you may have a few dirty socks in your hamper you can keep
it in the living room.
Things start to pile up pretty fast, and it is nasty having to dive into a
dirty pile to find the red socks all the way on the bottom.

I have not fully gotten the essence of the horror I'm facing because of moving
out of folders
But, I hope this helps.

To see my horror in action, go to www.manhattanapts.com and click on
"Neighborhoods" and look at the page locations in the explorer bar while
clicking links!"

Good Luck,
Dave

P.S. I'm just starting three music websites myself and it is easier to
organize the m3u text file in the same folder and the mp3, the html page for
that song and even an images subfolder for specific graphics to that song alone.
 
darrel replied to DaveBlues on 21 Jul 2004
If you are using PHP and a CMS (I assume) why do you have so many pages?

For instance, the 'agents secrets' pages...they all could just be one page,
but pass a different variable to the page.

Ie:
secrets.php?page=overview
secrets.php?page=renting
secrets.php?page=buying
secrets.php?page=selling

One page/template, but pass different content depending on the variable

Same goes for most of the other pages.

-Darrel
 
Murray *TMM* replied to darrel on 21 Jul 2004
For want of a nail, a kingdom was lost.

Or maybe it's David and Goliath.

Or - say, is it lunch time yet? 8)
 
Joe Makowiec replied to darrel on 21 Jul 2004
Tell 'em that *nobody* on dialup is ever gonna see those pages -
http://www.manhattanapts.com/tips-buying.php runs ~290K, or a 70
second download at 56K.

http://www.websiteoptimization.com/services/analyze/wso.php?url=http://www.manhattanapts.com/tips-buying.php
 

Archived message: /xxx.html vs /xxx/index.html (Macromedia Dreamweaver)