Admit it, you've done it many times before: you create a database schema, arbitrarily guessing at upper limits to the lengths of various columns, only to later have to perform annoying schema upgrades as you change those columns to fit real-world data.
If you're using PostgreSQL, what you're doing is pointless. There is quite literally no performance benefit to be had, and possibly a performance penalty as the database needs to check the length constraint. This fact can even be found right here, in the official documentation.
If you're using MySQL with InnoDB, it's practically the same situation. The data is laid out on disk in exactly the same way for both TEXT and VARCHAR fields, as explained here. I couldn't find any resources about MyISAM other than that TEXT is stored externally, but I just fired up a test table and did some rudimentary benchmarking and the numbers were well within the margin of error.
If you're using SQLite, everything's a TEXT whether you want it to be or not ( with the notable exception of INTEGER PRIMARY KEY) so it doesn't matter what you try to specify, it will be a TEXT.
I'm as guilty as anyone else on this--I use varchar all the time! But come on, let's stop imposing these arbitrary character limits on our columns when the only reason we're doing it is for historical reasons. Is anyone with me?
All Content


By Eric Holscher at 12:25 p.m. on Nov. 11, 2008
No
By Alex at 12:35 p.m. on Nov. 11, 2008
On MySQL at least, I'm 90% sure you can't index on a text field.
By Eric Florenzano at 12:41 p.m. on Nov. 11, 2008
You are not correct
http://hackmysql.com/case2
By Marcos at 1:07 p.m. on Nov. 11, 2008
Being a django person I must say I disagree with you. I use the django admin app a lot. Why would you want, for example, a text field setup by default with a wysiwyg editor just to enter/edit a first name? This kind of examples makes the admin interface a bit clumsy. Don't you agree?
Rgds,
Marcos
By Eric Florenzano at 1:12 p.m. on Nov. 11, 2008
I see what you're saying, but we're really talking about two things. I'm talking about the backend storage definition, and the admin interface is really more display logic. We should choose our database storage based on what fits the problem best, and then make sure that Django's admin interface will display it in a reasonable way.
By Marcos at 1:28 p.m. on Nov. 11, 2008
I agree.
By Dave K at 1:48 p.m. on Nov. 11, 2008
If you're using MS SQL maybe you should just put a bullet in the server and start over.
By huxley at 7:11 p.m. on Nov. 13, 2008
Sorry, Dave, are you arguing in favour or against?
By mike bayer at 2:09 p.m. on Nov. 11, 2008
I can't speak for MySQL and Postgres as authoritatively, but TEXT columns in Oracle and MSSQL have any number of these limitations:
1. can't compare them
2. can't use LIKE on them
3. can't fetch or insert large ones without using a separate cursor
4. can't ORDER BY them
5. client libraries often have limitations being able to stream them efficiently/easily
Even if MySQL/PG is in fact able to index them, you'll still get poorer performance by indexing a column with potentially huge strings in it versus one which is guaranteed to have strings no larger than a fixed size in them. It's also very reasonable to use size-constrained columns in any case for any type of data that is in fact meant to be a short string, like email addresses. Constraints are basically something you should be using as often as possible in SQL.
By Eric Florenzano at 3 p.m. on Nov. 11, 2008
Those are some good reasons for using varchar in Oracle and MSSQL. (I really don't have much experience with them, so I can't say much more there).
I do still think that in PostgreSQL (and probably MySQL as well) there's not much you gain by using VARCHAR.
I agree that constraints are a good thing in general, and for something like a country code or a zip code or something which has a guaranteed fixed length, it makes sense. But what length do you choose for a first name? For an occupation? These limits end up being arbitrary and usually end up being either too long or too short. If PostgreSQL and MySQL give no performance reason to invent that arbitrary limit, why do so?
By barbara at 2:56 p.m. on Nov. 11, 2008
I think you're just trying to stir up controversy. :) Performance shouldn't be the only benchmark (I mean benchmark in the philosophical sense). Ease of use in building/maintaining an application never seems to count for as much, but if you're writing core business applications, then that should also be a practical consideration. My point is that I can imagine a lot of cases where the data would become sloppy and unmanageable in a hurry.
By Eric Florenzano at 3:09 p.m. on Nov. 11, 2008
I'm not trying to stir up controversy. You mention ease of use in building/maintaining an application:
It's easier to build an app if you don't have to invent arbitrary limits on fields.
It's easier to maintain an app if you're not having to alter your table when you realize that those limits are not matching real-world data.
By Dave K at 3:12 p.m. on Nov. 11, 2008
Eric, isn't there a way to specify the use of an <input>rather than <textarea> for certain text fields in the admin now? Or am I dreaming…
By DZ at 4:19 p.m. on Nov. 11, 2008
re: MySQL, see:
http://forums.mysql.com/read.php?24,105964,105984#msg-105984
By Scott Klarr at 4:42 p.m. on Nov. 11, 2008
There is in fact a performance difference in mysql when you are dealing with indexed fields. I have had to optimize the DB structures for sites with hundreds of thousands of registered users with millions of daily hits - one of the biggest performance tweaks was using the correct data type and proper indexing.
If you are recording an email address for example, there is absolutely no reason why you need to use text when the maximum characters for a valid email address is 320
By jos3ph at 4:52 p.m. on Nov. 11, 2008
Wow, what a coincidence!
I just ran into what seems to be an arbitrary 50 character limit on the site name field in Django's Site contrib app.
Why only 50 characters? :(
If it where TEXT, I would not have this problem to worry about.
TEXT, FTW!
By Timothy Fitz at 5:15 p.m. on Nov. 11, 2008
There is a dramatic difference between VARCHAR and TEXT in MySQL (both INNODB and MyISAM). On disk VARCHAR(16) looks like this
(other row data)0x00000000000000000000000000000000(other row data).
On disk, TEXT looks like this
(other row data)0x0383C4BF(other row data)
Where 0x0383C4BF is a made up pointer to a text heap stored alongside the table.
The implications are:
TEXT costs are O(len(text)), while VARCHAR is O(1) (your predefined size)
Reading a VARCHAR is fast, on InnoDB it's interleaved with the primary keys index, and is probably in cache.
Reading a TEXT can be slow, it's a random read into the TEXT heap, which could suck.
Reading a TEXT can be fast, if the data is rarely there but often accessed then the TEXT heap is small, and probably in cache.
On small datasets (less than your memory) or where throughput isn't an issue (small website), the perf differences come out a wash. Either way, do whichever is easier first and worry about perf later!
By Daniel Dornhardt at 5:59 p.m. on Nov. 11, 2008
@Timothy Fitz:
You wrote that a VARCHAR(16) looks like this on the disk / in a row:
0x00000000000000000000000000000000
a) Strings are probably not padded by nulls, if anything they'd be padded by spaces
b) you're describing the CHAR type, which has fixed length
c) VARCHARs in MySQL are variable length strings, so you don't waste much space most of the time... for VARCHARs < 255 in length, it's one byte for the length and then the size of the actual string after that.
By Daniel Dornhardt at 5:59 p.m. on Nov. 11, 2008
@ Everybody:
This is my conclusion: It doesn't matter if you pick a VARCHAR(4) or a VARCHAR(255), as the resulting used space will be the same. So I always use VARCHAR(255) wherever I can choose an arbritrary size > 4. Maybe not if it's clear that there can only be a specific length, as not to confuse my coworkers, but if its a name or an email-address or something similar, I go with VARCHAR(255).
For very short string columns (< 3 in length, I think) it's more efficient to chose a CHAR, because you won't need the length byte of the VARCHAR.
I still won't pick TEXT - Fields for anything I'll have to sort, group, order by or may be forced to use functions on someday, even though maybe I need to do some reading on indexes for text fields. But I generally feel more flexible if I can run functions on the fields inside of the table directly than if the database has to go and search for the data by the pointer in the TEXT column.
By Daniel Dornhardt at 6:01 p.m. on Nov. 11, 2008
Quote from the MySQL Manual ( http://dev.mysql.com/doc/refman/5.0/en/char.html ):
The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. ( ... )
In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.
By Adam Żochowski at 7:40 p.m. on Nov. 11, 2008
All dbs mentioned, MySQL, Postgresql and MSSQL, have a limit of 8kb rowsize. If you sum up how much data your row can handle, without indirection pointers (blob/text), then the limit is 8kb.
Try :
create table test ( a varchar(8000) , b varchar(8000) ) ;
You will get a warning when creating this table, and an error when len(a)+len(b)>8000.
Even more tricky with joins:
create table test_a ( id_a int, a varchar(8000) ) ;
create table test_b ( id_b int , b varchar(8000) ) ;
All ok. Fill up tables, now do:
select * from test_a inner join test_b on id_a = id_b;
Error because:
len(test_a.a)+len(test_b.b)>8000
Use text if you want to avoid rowsize issues.
And on another note, I see people comment on indexing of varchars vs text. Varchar has stupid indexes, Text columns have full text indexing.
Similarily, ordering on text is nicer if you sort by full text search weight.
ps: Ironicly, this comment can only be 1000chars
By Anonymous at 9:31 p.m. on Nov. 11, 2008
You really need to take a database class if you truly believe that VARCHAR and TEXT are equal. Or maybe even an algorithms class to learn why knowing the size can change how you search/order/sort things.
Just because one or two implementations show some similarities in a few test cases is hardly conclusive.
By me at 9:48 p.m. on Nov. 11, 2008
why does one declaring a size make the actual size that you'd optimize for?
Why not just let the datbase figure out lengths from the actual data and use that for optimizations?
By Matt at 10:37 p.m. on Nov. 11, 2008
I think the reason for this was that older versions of mysql, etc. assumed a single-byte character set. So a varchar(32) needed 32 bytes. But now, for example, a UTF-8 encoded varchar(32) could need 128 bytes if all of the chars are 4-byte chars. So varchars actually end up using the same processing as TEXT.
By alan at 12:26 a.m. on Nov. 12, 2008
A couple things: 1) use varchar for name type attributes. 2) use text for possibly long, unknown length input; you should protect against an unreasonably long length text insertion either with an insert check constraint, or at your application level.
By default, in Postgres, varchar is unlimited length, max row size is 1.6 TB, and max field size is 1 GB. With that said, refrain from using varchar for non label type fields. There are internal implicit type casts that take place in the database, and you should use the proper field types instead of attempting catch-alls. You might want to index a varchar, but you'll never want to index a text field. You should use a text search on a text field, like TSearch2.
This blog post is horrible. Please consider adding an "edit" to state you were completely misinformed.
By Al at 7:51 p.m. on Nov. 12, 2008
Database aside for a moment, it'll also mean that your application code (Python, Ruby, C#, Java, ..) will need to use a String style type everywhere when a simple char would have been sufficient. If you're working with larger sets of information, you may be in for a surprise about the change in memory consumption.
By Tommy Cutter at 12:41 p.m. on Nov. 13, 2008
One benefit to using varchar over text in MySQL is that if you set your varchar field to be binary, it's values are case-sensitive, which is helpful for password checking. Also, if your passwords are encrypted and are always the same length, a character limit is a good idea to avoid program errors.
By Jordan at 5:11 p.m. on April 21, 2009
First, I agree that if you are dealing with a known limited length, varchars are best. I generally find the longest possible string and double it just to be safe unless I am certain it has a maximum length.
However:
Mike: In MSSQL, you can actually use the LIKE operator in TEXT fields. So this is not a huge disadvantage. Furthermore, although there is a slight performance hit you can order on a text string by using a CAST (You could make a safe bet that the first 200 characters are sufficient for sorting).
AI: Python does not, as far as I know, have a character type. In Ruby, everything is already an object so I have a hard time believing that you'd experience a performance increase simply by using a varchar. Finally, as some have noted above, in many databases varchar is really just a truncated implementation of the TEXT datatype, simply to easy implementation of things like Unicode.
Personally, I think that the main reason to use varchar over text is to rein in your data. If you use TEXT for every field, there is a possibility that an impossibly long string will get into your database and that might mess up things like output and searching. I have a feeling that performances improvements for most off-the-shelf database implementations are going to be minor and in any case there are other areas where the improvement would be much greater.
By ben 10 oyunları at 3:10 a.m. on May 25, 2009
In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.
By WoW Power Leveling at 4:39 a.m. on June 13, 2009
Thx for this great news for us, it is really cool.
By Nursing degree at 3:40 a.m. on June 22, 2009
The length prefix indicates the number of bytes in the value.
By Online learning degrees at 3:40 a.m. on June 22, 2009
Similarily, ordering on text is nicer if you sort by full text search weight.
By Special education online at 3:41 a.m. on June 22, 2009
I need to do some reading on indexes for text fields.
By Political science degree at 3:42 a.m. on June 22, 2009
I couldn't find any resources about MyISAM other than that TEXT is stored externally.
By Online psychology degrees at 3:43 a.m. on June 22, 2009
Or maybe even an algorithms class to learn why knowing the size can change how you search/order/sort things.
By lingerie at 8:13 p.m. on June 22, 2009
This was my first PyCon. I also attended 3 tutorials on Thursday. I was a bit disappointed in the tutorials since you really expect to get much out of a 3 hour tutorial
By jordan shoes at 1:21 a.m. on June 25, 2009
I need to do some reading on indexes for text fields.
By jordan shoes at 1:22 a.m. on June 25, 2009
I couldn't find any resources about MyISAM other than that TEXT is stored externally.
By ugg boots at 1:23 a.m. on June 25, 2009
Or maybe even an algorithms class to learn why knowing the size can change how you search/order/sort things.
By nike shoes at 1:24 a.m. on June 25, 2009
Similarily, ordering on text is nicer if you sort by full text search weight.
By tiffany jewellery at 1:24 a.m. on June 25, 2009
The length prefix indicates the number of bytes in the value.