job.answiz.com
  • 4
Votes
name

What's a correct format of a geographical address/location which is a good fit for any address on the Earth? At the moment I have:

  • country
  • city
  • street
  • number
  • text data (for simplicity)
  • zip
  • lat/lng

But I believe I can improve it: there might be a state/region of a country or something like area. Or no area/region/state, say, in Singapore or Hong Kong.

There might be no street, but road or boulevard or something else. A number of a building might be compound. There might be a floor. A room number. Etc....

As already stated, the most universal (but impractical to validate and perhaps least useful) is a single big unicode field.

You could separate country from the rest of the address and store it as the ISO country code. It would normalize the country and offer some utility in validating the remainder of the address.

You could also separate postal code aka zip code from the rest of the address. This would also have some utility in validating the remainder of the address, and could be helpful (though imprecise) in geolocation. For example: in Canada you can uniquely identify any address specifying only postal code and street number (aka house number); this may not be true in all countries.

Dedicating fields to states/provinces or cities starts to get more problematic because of the variations in the way each country formulates an address. I've set up address tables having such fields because the initial audience is focused on North America, knowing that an international audiences would pose a problem fitting in. In most cases, they can be "shoe-horned" in, but it is an awkward and potentially failure-prone compromise - definitely not universal.

  • 1
Reply Report

The universal way to store a geographical address/location in a database is this one:

[Address] nvarchar(max) not null

This requires the least amount of programming code (and so cuts maintenance costs) and is fully compatible with any address. It has, however, three big issues:

  • The lack of data validation means that the field can be used for the purposes other than storing the address. One of the purposes is a DOS attack intended to fill the space of your database by entering 2 GB of data in the address field.

  • The data stored this way makes it impossible to process it for business intelligence and data mining purposes. For instance, how many users are from India? There is no easy way to tell, since those addresses won't be normalized.

  • The users may mistakenly enter an incomplete or plainly wrong address.

In order to mitigate the first issue, limit the field to what you think to be a reasonable limit. Personally, I would start with 1000 characters, and then reduce it based on the length of the addresses entered by the first users once you get a data set large enough.

In order to mitigate the other two issues, you can use a third-party API which parses addresses and presents you with the data containing the country, city, postal code, etc. If possible, the API should be able to display the address on a map back to the user to reduce the risk for the user of entering an incomplete or wrong address: most users know where they live, and seeing a different position on a map would immediately give them a clue that they should check their input.

Note that whatever API you use, it won't be perfect. It will find most addresses, but not all of them. This means that if the API tells that the address doesn't exist, but the user insists that it does, you should a prioritrust the user, even if he might be wrong.

This also means that you still should store the original user's input, side-by-side with the result of the API. This means that the schema becomes:

[RawAddress] nvarchar(max) not null
[ParsedAddress] xml null
  • 0
Reply Report

Google has developed a library that helps validate postal addresses for every country in the world, which you can use to design a schema to store this data.

Look for the most common required fields across addresses from your targeted customer base to get started, and as you identify further countries with different requirements you can continue to adjust your schema.

  • 3
Reply Report