Just Another Law-Abiding Citizen: Beware of Nulls

Tuesday, April 26, 2011

Beware of Nulls

I ran across what I think was an interesting issue recently.
We've been trying to implement a text search component using Oracle 11.2.0.2, featuring an 11g enhancement called SDATA (structured data) sections. It helps solve one of Oracle Text's biggest issues -- that of "mixed" queries (a query that contains a basic text search with structured data qualifiers -like date limits- that are supposed to make the query faster).
In past versions, this type of query would never perform as fast as either a straight text query or a structured-only query; the optimizer would pick one way or the other to generate the list of matching rowids for one part of the query, then run the other part to generate the "final" list of matching rows. So the query takes longer, when the user intuitively feels that the additional information provided should help make the query faster.
In this case, our testing was giving us some unexpected results when we were filtering SDATA negation-- chiefly, the resultset count of matches was lower than it should have been. So, for example, a query for something like 'Ben Franklin and SDATA(language != "FRENCH")' did not return the right number of rows based on the tester's description of the test data.
Investigation revealed that the core issue was that the language field was null in some documents (even though the XML document described the element as mandatory).
Here's an example:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Oracle Label Security and OLAP options

SQL> create table test_tab
  2  (text        varchar2 (30),
  3  language  varchar2 (10));

Table created.

SQL> insert all
  2    into test_tab values ('Bon Homme Richard', 'ENGLISH')
  3    into test_tab values ('Bon Homme Richard', 'FRENCH')
  4    into test_tab values ('Bon Homme Richard', 'CANADIAN')
  5    into test_tab values ('Bon Homme Richard', '')
  6    select * from dual;

4 rows created.

SQL> insert into test_tab select object_name, null from all_objects;

54562 rows created.

SQL> create index test_idx on test_tab (text)
  2    indextype is ctxsys.context filter by language;

Index created.

SQL> select score (1), text, language from test_tab
  2    where contains(text,'(Bon Homme Richard)', 1) > 0;

  SCORE(1) TEXT                           LANGUAGE
---------- ------------------------------ ----------
        15 Bon Homme Richard              ENGLISH
        15 Bon Homme Richard              FRENCH
        15 Bon Homme Richard              CANADIAN
        15 Bon Homme Richard


SQL> select score (1), text, language from test_tab
  2    where  contains(text,'(Bon Homme Richard)
  3    AND sdata (language != "FRENCH")', 1) > 0;

  SCORE(1) TEXT                           LANGUAGE
---------- ------------------------------ ----------
        15 Bon Homme Richard              ENGLISH
        15 Bon Homme Richard              CANADIAN

In this case, the fix was to correct the source data to set the language field properly. In a case where null values are permitted, the searcher [or developer] needs to be aware of these comparison rules:

select score (1), text, language from test_tab
where  contains(text,'(Bon Homme Richard)
AND (sdata (language != "FRENCH") OR sdata(language is null))', 1) > 0;

  SCORE(1) TEXT                           LANGUAGE
---------- ------------------------------ ----------
        15 Bon Homme Richard              ENGLISH
        15 Bon Homme Richard              CANADIAN
        15 Bon Homme Richard

No comments:

Post a Comment

Pondering...

I hope I shall possess firmness and virtue enough to maintain what I consider the most enviable of all titles, the character of an honest man. - George Washington

Differences in political opinions are as unavoidable as, to a certain point, they may perhaps be necessary; but it is exceedingly to be regretted that subjects cannot be discussed with temper on the one hand, or decisions submitted to without having the motives, which led to them, improperly implicated on the other; and this regret borders on chagrin when we find that men of abilities, zealous patriots, having the same general objects in view, and the same upright intentions to prosecute them, will not exercise more charity in deciding on the opinions and actions of one another. - George Washington

A sound mind in a sound body is a short but full description of a happy state in this world. - John Locke

However [political parties] may now and then answer popular ends, they are likely in the course of time and things, to become potent engines, by which cunning, ambitious, and unprincipled men will be enabled to subvert the power of the people and to usurp for themselves the reins of government, destroying afterwards the very engines which have lifted them to unjust dominion. - George Washington, Farewell Address

A state that dwarfs its men in order that they may be more docile instruments in its hands even for beneficial purposes, will find that with small men no great things can really be accomplished. -- John Stuart Mill

Fine art and pizza delivery, what we do falls neatly in between! - David Letterman

Wait a tic ... blimey, this redistribution of wealth is trickier than I thought.
- Dennis Moore

Rabbit: Luckiest of all signs, you are also talented and articulate. Affectionate, yet shy, you seek peace throughout your life. - Chinese Zodiac

In other words, to disagree well you must first understand well. You have to read deeply, listen carefully, watch closely. You need to grant your adversary moral respect; give him the intellectual benefit of doubt; have sympathy for his motives and participate empathically with his line of reasoning. And you need to allow for the possibility that you might yet be persuaded of what he has to say. - Bret Stephens

Just Another Law-Abiding Citizen

Tuesday, April 26, 2011

Beware of Nulls

No comments:

About Me

Search This Blog

Pondering...

Blog Archive