Tuesday, April 26, 2011

Beware of Nulls

I ran across what I think was an interesting issue recently.
We've been trying to implement a text search component using Oracle 11.2.0.2, featuring an 11g enhancement called SDATA (structured data) sections. It helps solve one of Oracle Text's biggest issues -- that of "mixed" queries (a query that contains a basic text search with structured data qualifiers -like date limits- that are supposed to make the query faster).
In past versions, this type of query would never perform as fast as either a straight text query or a structured-only query; the optimizer would pick one way or the other to generate the list of matching rowids for one part of the query, then run the other part to generate the "final" list of matching rows. So the query takes longer, when the user intuitively feels that the additional information provided should help make the query faster.
In this case, our testing was giving us some unexpected results when we were filtering SDATA negation-- chiefly, the resultset count of matches was lower than it should have been. So, for example, a query for something like 'Ben Franklin and SDATA(language != "FRENCH")' did not return the right number of rows based on the tester's description of the test data.
Investigation revealed that the core issue was that the language field was null in some documents (even though the XML document described the element as mandatory).
Here's an example:
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Oracle Label Security and OLAP options

SQL> create table test_tab
  2  (text        varchar2 (30),
  3  language  varchar2 (10));

Table created.

SQL> insert all
  2    into test_tab values ('Bon Homme Richard', 'ENGLISH')
  3    into test_tab values ('Bon Homme Richard', 'FRENCH')
  4    into test_tab values ('Bon Homme Richard', 'CANADIAN')
  5    into test_tab values ('Bon Homme Richard', '')
  6    select * from dual;

4 rows created.

SQL> insert into test_tab select object_name, null from all_objects;

54562 rows created.

SQL> create index test_idx on test_tab (text)
  2    indextype is ctxsys.context filter by language;

Index created.

SQL> select score (1), text, language from test_tab
  2    where contains(text,'(Bon Homme Richard)', 1) > 0;

  SCORE(1) TEXT                           LANGUAGE
---------- ------------------------------ ----------
        15 Bon Homme Richard              ENGLISH
        15 Bon Homme Richard              FRENCH
        15 Bon Homme Richard              CANADIAN
        15 Bon Homme Richard


SQL> select score (1), text, language from test_tab
  2    where  contains(text,'(Bon Homme Richard)
  3    AND sdata (language != "FRENCH")', 1) > 0;

  SCORE(1) TEXT                           LANGUAGE
---------- ------------------------------ ----------
        15 Bon Homme Richard              ENGLISH
        15 Bon Homme Richard              CANADIAN
In this case, the fix was to correct the source data to set the language field properly. In a case where null values are permitted, the searcher [or developer] needs to be aware of these comparison rules:
select score (1), text, language from test_tab
where  contains(text,'(Bon Homme Richard)
AND (sdata (language != "FRENCH") OR sdata(language is null))', 1) > 0;

  SCORE(1) TEXT                           LANGUAGE
---------- ------------------------------ ----------
        15 Bon Homme Richard              ENGLISH
        15 Bon Homme Richard              CANADIAN
        15 Bon Homme Richard              

Fine-Grained Auditing

Someone recently asked about Oracle fine-grained auditing (FGA) and how to apply it to a situation in which "everyone except Joe in HR gets audited".
Here is the sample solution I provided (where the table owner FGATEST is the user that won't get audited):
conn / as sysdba
create user fgatest identified by fgatest;
grant connect, resource, dba to fgatest;

connect fgatest/fgatest
create table foobar (fooid number, bar varchar2(30));
grant select, insert, update, delete on foobar to PUBLIC;


BEGIN
  dbms_fga.add_policy(
  object_schema => 'FGATEST',
  object_name => 'FOOBAR',
  policy_name => 'FOOBAR_AUD',
  audit_condition => 'sys_context(''userenv'',''session_user'') != ''FGATEST''' ,
  statement_types => 'SELECT,INSERT,UPDATE,DELETE');
end;
/

insert into foobar select 1, sys_context('userenv','current_user') from dual;
insert into foobar select 2, sys_context('userenv','session_user') from dual;
select * from foobar;

select username from dba_users where account_status='OPEN';
conn tst/tst
insert into foobar select 3, sys_context('userenv','current_user') from dual;
insert into foobar select 4, sys_context('userenv','current_user') from dual;
The script creates the FGATEST user, table, and FGA policy, then adds rows as both FGATEST and TST. Our expectation is that the inserts and selects performed by FGATEST won't be audited, but the ones performed by TST will. Let's check it out:
conn / as sysdba
select db_user, os_user, object_schema, object_name, sql_text from dba_fga_audit_trail;

DB_USER              OS_USER              OBJECT_SCHEMA        OBJECT_NAME          SQL_TEXT
-------------------- -------------------- -------------------- -------------------- ----------------------------------------
TST                  BARCLAY\kengel       FGATEST              FOOBAR               insert into foobar select 3,
                                                                                    sys_context('userenv','current_user')
                                                                                    from dual

TST                  BARCLAY\kengel       FGATEST              FOOBAR               insert into foobar select 4,
                                                                                    sys_context('userenv','current_user')
                                                                                    from dual

TST                  BARCLAY\kengel       FGATEST              FOOBAR               select * from foobar