Saturday, November 14, 2015

Suggesting a corrected column name/spelling in the event of a column misspelling


One small PostgreSQL 9.5 feature I worked on is the new hinting mechanism feature, which sometimes hints, based on a score, what you might have meant to type following misspelling a column name in an SQL query. The score heavily weighs Levenshtein distance. A HINT message is sent to the client, which psql and other client tools will display by default.

It's common to not quite recall offhand if a column name is pluralized, or where underscores are used to break up words that make up the name of a column. This feature is targeted at that problem, providing guidance that allows the user to quickly adjust their query without mental context switching. For example:

postgres=# select * from orders where order_id = 5;
ERROR:  42703: column "order_id" does not exist
LINE 1: select * from orders where order_id = 5;
                                   ^
HINT:  Perhaps you meant to reference the column "orders"."orderid".



You may also see a hint in the case of two possible matches, provided both matches have the same score, and the score crosses a certain threshold of assumed usefulness:

postgres=# select * from orders o join orderlines ol on o.orderid = ol.orderid where order_id = 5;
ERROR:  42703: column "order_id" does not exist
LINE 1: ...oin orderlines ol on o.orderid = ol.orderid where order_id =...
                                                             ^
HINT:  Perhaps you meant to reference the column "o"."orderid" or the column "ol"."orderid".

If an alias was used here (which this query must have anyway), the hint becomes more specific:

postgres=# select * from orders o join orderlines ol on o.orderid = ol.orderid where o.order_id = 5;
ERROR:  42703: column o.order_id does not exist
LINE 1: ...oin orderlines ol on o.orderid = ol.orderid where o.order_id...
                                                             ^
HINT:  Perhaps you meant to reference the column "o"."orderid".

This feature should make writing queries interactively in psql a bit more pleasant. Mental context switching to figure these incidental details out has a tendency to slow things down.

2 comments:

  1. This sounds great. Also a feature that competing databases do not have!Does it use fuzzystrmatch? i.e. will fuzzystrmatch still have to be installed as an extension to get Levenshtein distance?

    ReplyDelete
  2. The Levenshtein distance is just an implementation detail -- it won't actually be displayed here, of course. You don't have to have fuzzystrmatch installed to see these hints.

    Part of the work behind this patch was moving the guts of the fuzzystrmatch Levenshtein distance implementation to the core code. There still isn't a user-visible interface to that, though, except for the same old SQL-callable functions within the fuzzystrmatch contrib extension.

    ReplyDelete