Thursday, March 22, 2012

3 XML Bulk Load questions

1) Is it possible to pass the schema file as a stream, rather than a
pathname to a file on disk, as the first parameter of the Bulk Load Execute
method?
2) I've built a .Net wrapper for the Bulk Load application and while I'm
considering expanding on it's functionality I'm pretty confident that
Microsoft will eventually expand their own native .Net versions of the
SQLXML suite to include Bulk Load. Any ideas if this is coming soon or are
we likely looking at a SQL 2005 timeframe.
3) I am encountering the following error: "No data was provided for column
'COLUMN1' on table 'TABLE1', and this column cannot contain NULL values."
According to the documentation, the KeepNulls property, "Specifies what
value to use for a column that is missing a corresponding attribute or
subelement in the XML document. This is a Boolean property. When the
property is set to TRUE, XML Bulk Load assigns a null value to the column.
It does not assign the column's default value, if any, as set on the
server." I have tried setting the KeepNulls property to both True and
False and the same error message is returned.
It seems that if the schema file defines a table element but the data file
doesn't include a value for that element, then bulk load attempts to insert
a NULL value for this field . This obviously becomes a problem in tables
which have fields defined as NOT NULL. That said, I believe you can define
a default value (via attributes) in the case where no value is supplied,
however this design decision seems somewhat flawed to me from a
functionality and performance standpoint.
For example, you might have a table with 50 fields but in some cases you
might only be inserting 2 field values. Therefore, if you create a schema
file with all 50 fields and their corresponding defaults, I imagine Bulk
Load will have to dynamically build and run the following query when only 2
fields values are supplied in a data file for a record...
INSERT INTO TABLE1 (Field1,Field2...Field50)
VALUES (Value1,Value2,...Value50)
rather than dynamically building a shorter Insert statement and running the
following more efficient query...
INSERT INTO TABLE1 (Field1,Field2)
VALUES (Value1,Value2)
Is there a particular reason this design approach was taken or am I missing
a different approach to this particular problem?
Thank you.
ps . I'm using the latest version from the website - SQLXML 3.0 SP2.
Since there have been no replies to my original post...
1) It looks like there isn't an overloaded method to pass a stream as the
first parameter so you must pass the path to a file. That said, if
Microsoft is listening, this would be nice for those cases where you are
dynamically building the schema in memory and would like to prevent having
to force a physical read from the disk.
2) I've read somewhere that Microsoft will be supplying a fully .Net enabled
version - probably sometime during the Longhorn/SQL 2005 timeframe.
3) While it would be nice to get some feedback on the design decision I
hinted at in my third question I was more concerned with the implementation
of Bulk Load given a specific scenario. Here's my findings given three
different approaches with my original question reflected in the third
scenario...
Example 1: Mapped field with value:
Schema file - contains field DocumentID
Data file - contains data for field DocumentID
Result - Successfully integrates value supplied in the data file into the
DocumentID field
Example 2: Unmapped field no value:
Schema file - doesn't contain field DocumentID
Data file - no data for field DocumentID
Result - If a default value has been setup for this field in the destination
table then that value is used, otherwise if no default is defined, a NULL
value is inserted. In addition, you can force a NULL to be inserted for
unmapped columns by setting the KeepNulls property to TRUE. In both
situations where a NULL might be inserted, if no nulls are defined for the
field in the destination table then an error is returned.
Example 3: Mapped field no value:
Schema file - contains field DocumentID
Data file - no data for field DocumentID
Result - Integration fails and error message is returned - "No data was
provided for column 'DocumentID' on table 'Table1', and this column cannot
contain NULL values (the KeepNulls property has no bearing on the results in
this case as it only relates to unmapped columns). You can get around this
by defining a default for the column using attributes, however, in my
opinion this is a flawed solution as it just increases the size of your
schema file and requires you to keep your destination table and schema file
default values always in sync.
So why doesn't Bulk Load use the field default instead of trying to insert a
NULL value in scenario 3? I guess the Bulk Load program is creating the
column_list for its INSERT statement based on ALL the values from the
schema file. And where there is no corresponding data value, it inserts a
NULL value.
INSERT INTO TABLE1 (Field1,DocumentID...Field50)
VALUES (Value1,NULL,...NULL)
From a performance perspective I can understand the decision not to go out
and retrieve the default field values from the table, but wouldn't it make
more sense to just ignore missing data values that have been mapped and let
SQL Server handle the defaults? And in the situation where the intended
value is NULL, either have the data file contain an explicit NULL value or
add another Boolean property to the Bulk Load interface. As I listed
below, I believe the following make more sense and would increase
performance given the conditions outlined in scenario 3:
INSERT INTO TABLE1 (Field1,DocumentID)
VALUES (Value1,Value2)
Anyway, if anyone has insight into this or upcoming changes in functionality
I would be extremely grateful as we are building a fairly large program
around the Bulk Load interface and ideally we would like to prevent having
to rewrite portions of our solution as new versions of XMLSQL are released.
Thanks again.
"Cipher" <c@.c.com> wrote in message
news:OaSGJSFTEHA.1732@.TK2MSFTNGP09.phx.gbl...
> 1) Is it possible to pass the schema file as a stream, rather than a
> pathname to a file on disk, as the first parameter of the Bulk Load
Execute
> method?
> 2) I've built a .Net wrapper for the Bulk Load application and while I'm
> considering expanding on it's functionality I'm pretty confident that
> Microsoft will eventually expand their own native .Net versions of the
> SQLXML suite to include Bulk Load. Any ideas if this is coming soon or
are
> we likely looking at a SQL 2005 timeframe.
> 3) I am encountering the following error: "No data was provided for
column
> 'COLUMN1' on table 'TABLE1', and this column cannot contain NULL values."
> According to the documentation, the KeepNulls property, "Specifies what
> value to use for a column that is missing a corresponding attribute or
> subelement in the XML document. This is a Boolean property. When the
> property is set to TRUE, XML Bulk Load assigns a null value to the column.
> It does not assign the column's default value, if any, as set on the
> server." I have tried setting the KeepNulls property to both True and
> False and the same error message is returned.
> It seems that if the schema file defines a table element but the data file
> doesn't include a value for that element, then bulk load attempts to
insert
> a NULL value for this field . This obviously becomes a problem in tables
> which have fields defined as NOT NULL. That said, I believe you can
define
> a default value (via attributes) in the case where no value is supplied,
> however this design decision seems somewhat flawed to me from a
> functionality and performance standpoint.
> For example, you might have a table with 50 fields but in some cases you
> might only be inserting 2 field values. Therefore, if you create a schema
> file with all 50 fields and their corresponding defaults, I imagine Bulk
> Load will have to dynamically build and run the following query when only
2
> fields values are supplied in a data file for a record...
> INSERT INTO TABLE1 (Field1,Field2...Field50)
> VALUES (Value1,Value2,...Value50)
> rather than dynamically building a shorter Insert statement and running
the
> following more efficient query...
> INSERT INTO TABLE1 (Field1,Field2)
> VALUES (Value1,Value2)
> Is there a particular reason this design approach was taken or am I
missing
> a different approach to this particular problem?
> Thank you.
> ps . I'm using the latest version from the website - SQLXML 3.0 SP2.
>
|||Sorry for the delay in answering.
1) You are correct. It's something we're committed to doing for your topic
in #2
2) We have been working on one that was to be delivered in Whidbey, but that
has since been pushed back. We're now looking at it for the Longhorn
timeframe with the .Net Framework
3) If the column is not null, you need something to put in there. Have you
declared a default value in your database? If so, setting KeepNulls=False
should mean that that value is generated and stored in the table.
Irwin Dolobowsky
Program Manager - SqlXml
http://blogs.msdn.com/irwando
This posting is provided "AS IS" with no warranties, and confers no rights.
"Cipher" <c@.c.com> wrote in message
news:%23BRINDzTEHA.3976@.TK2MSFTNGP09.phx.gbl...
> Since there have been no replies to my original post...
> 1) It looks like there isn't an overloaded method to pass a stream as the
> first parameter so you must pass the path to a file. That said, if
> Microsoft is listening, this would be nice for those cases where you are
> dynamically building the schema in memory and would like to prevent having
> to force a physical read from the disk.
> 2) I've read somewhere that Microsoft will be supplying a fully .Net
> enabled
> version - probably sometime during the Longhorn/SQL 2005 timeframe.
> 3) While it would be nice to get some feedback on the design decision I
> hinted at in my third question I was more concerned with the
> implementation
> of Bulk Load given a specific scenario. Here's my findings given three
> different approaches with my original question reflected in the third
> scenario...
>
> Example 1: Mapped field with value:
> Schema file - contains field DocumentID
> Data file - contains data for field DocumentID
> Result - Successfully integrates value supplied in the data file into the
> DocumentID field
>
> Example 2: Unmapped field no value:
> Schema file - doesn't contain field DocumentID
> Data file - no data for field DocumentID
> Result - If a default value has been setup for this field in the
> destination
> table then that value is used, otherwise if no default is defined, a NULL
> value is inserted. In addition, you can force a NULL to be inserted for
> unmapped columns by setting the KeepNulls property to TRUE. In both
> situations where a NULL might be inserted, if no nulls are defined for the
> field in the destination table then an error is returned.
>
> Example 3: Mapped field no value:
> Schema file - contains field DocumentID
> Data file - no data for field DocumentID
> Result - Integration fails and error message is returned - "No data was
> provided for column 'DocumentID' on table 'Table1', and this column cannot
> contain NULL values (the KeepNulls property has no bearing on the results
> in
> this case as it only relates to unmapped columns). You can get around
> this
> by defining a default for the column using attributes, however, in my
> opinion this is a flawed solution as it just increases the size of your
> schema file and requires you to keep your destination table and schema
> file
> default values always in sync.
>
> So why doesn't Bulk Load use the field default instead of trying to insert
> a
> NULL value in scenario 3? I guess the Bulk Load program is creating the
> column_list for its INSERT statement based on ALL the values from the
> schema file. And where there is no corresponding data value, it inserts a
> NULL value.
> INSERT INTO TABLE1 (Field1,DocumentID...Field50)
> VALUES (Value1,NULL,...NULL)
> From a performance perspective I can understand the decision not to go out
> and retrieve the default field values from the table, but wouldn't it make
> more sense to just ignore missing data values that have been mapped and
> let
> SQL Server handle the defaults? And in the situation where the intended
> value is NULL, either have the data file contain an explicit NULL value or
> add another Boolean property to the Bulk Load interface. As I listed
> below, I believe the following make more sense and would increase
> performance given the conditions outlined in scenario 3:
> INSERT INTO TABLE1 (Field1,DocumentID)
> VALUES (Value1,Value2)
> Anyway, if anyone has insight into this or upcoming changes in
> functionality
> I would be extremely grateful as we are building a fairly large program
> around the Bulk Load interface and ideally we would like to prevent having
> to rewrite portions of our solution as new versions of XMLSQL are
> released.
> Thanks again.
>
> "Cipher" <c@.c.com> wrote in message
> news:OaSGJSFTEHA.1732@.TK2MSFTNGP09.phx.gbl...
> Execute
> are
> column
> insert
> define
> 2
> the
> missing
>

No comments:

Post a Comment