Archive for July, 2008

Newt Screencast 2

July 17, 2008
Advertisements

32 Join Limit in SQLite

July 17, 2008

# The Problem

SQLite allows a maximum of 32 joins at any given moment (for good reason), and it is one of the databases we have to support. But when we do a ticket query, the ticket fields need to be joined with the `Field` table, which is then joined with either the `PrimitiveField` or the `ComplexField` table, which is then joined with the implementing class table (e.g. `MilestoneField`), which then *may* be joined with the target table (e.g. `Milestone`).

So for every query constraint (including ANDs and ORs) that the user specifies on the query page, there may be up to 4 joins in worst case. Thus if the user specifies 8 constraints or more, in worst case an `OperationalError` will occur.

# What we’ve tried

## Eager Joins

Do all the joins eagerly. I.e. Take all the fields that will be constrained, sieve them to leave only the unique tables, then do joins. Only after all these joins do we apply filters and ordering. This will do a join on the 3 tables mentioned above (i.e. the ticket field, `Field`, `PrimitiveField`/`ComplexField`) only once, then one join for each target table.

Disadvantages:

* This solution will increase the number of filters that can be applied only multiplicatively. I have reasoned mathematically that this multiplication constant is bound by 2. So the maximum number (in worst case) of filters increases to 16.

## Split Queries

Split a query into sub-queries, one for each field. Each sub-query will then have only 3 joins plus one if it’s a complex field. This will be done for each field, but the sub-query will be executed before we move on to the next field and so the maximum number of concurrent joins will be capped at 4.

Disadvantages:

* We will need to take the results of each sub-query and reconcile them manually. This can be done by taking the intersection of the resulting sets of tickets. This, however, circumvents database query optimizations and it will become very slow.
* We will need to do some extra joins at the end in order to do the ordering and grouping. This will require some work on our side when we would rather have the database do everything for us.

## Change the database schema

We may be able to change the database schema so that the queries don’t need to do as many joins.

Disadvantages:

* The current schema works very, very well. We really don’t want to change it. Nor do we want to change the way in which the tables are stored by Alchemy since that will lead to problems as far as the polymorphism is concerned.

## Change the #define in SQLite since it is open source

We need to explore this. Before we do, we should spend more time thinking whether (since we have more than 32 joins) we’re doing something wrong or not.

NEWT screencast

July 15, 2008

Things to do before NEWT can be merged into trunk

July 14, 2008

Luke is going to test newt today (?) and I have done some very extensive functional testing on Sunday. So there may be bugs that come up based on Luke’s work, but here’s what needs to be done before newt can go live:

  • Jeff needs to add the ability to add/remove fields in the front-end. Also the ability to change the default value of a field and its verbosity setting for queries. (ET: ask Jeff)
  • When viewing a ticket, if the value of any field is too long it will expand the width of the table; the table needs to maintain its size and the text should be wrapped. (ET: 1 hr)
  • On the new ticket page the way the fields are laid out is probably not the best way. Also, the tags do not seem to work. (ET: 1 day)
  • When changing the email notification in project preferences, submitting the form doesn’t actually change the value. (ET: 0.5 days or more)
  • Query filters cannot be loaded based on the URL string. This affects links from places like milestones. (ET: 1 day or more)
  • Sorting by a field in ticket queries removes all the filters. (ET: 0.5 days)
  • SQLite allows a 32 join maximum which translates to a maximum of 8 filters on the query page in worst case. This needs to be enhanced. (ET: no idea).

WordPress is better than LiveJournal

July 13, 2008

The title is an uinderstatement, I wish I knew this before and had the time to discover it.  But anyways, here is my previous blog.  I would like to port over all my postings from there to here for search hit purposes, but sadly I don’t know how to do that efficiently.

Hello world!

July 13, 2008

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!

My New Blog

July 13, 2008

https://drpnewt.wordpress.com

Join Limit

July 11, 2008

Regarding the problem where we have a limit of 32 joins, this is the change I recently made to ticket 1516:

************************************************************************************************

 Jeff and I have have been able to reason that any attempt to reduce the
 number of joins must be done in the database side of things; i.e. you
 cannot split the queries, execute the sub-queries then and use set
 theoretic operations to build the resulting set yourself.  [This is because
you need to be able to do orderings and you will need the target table to
be joined, but you don’t know what the target table is so all the joins need
to be present. You could probably do some work to figure this out, but
it’s really hacky and not what good programmers do.]

 We have also reasoned mathematically that, with the current database
 schema, it is not possible to reduce the number of joins by more than a
 factor of two.  So the worst case filter limit would be 16 filters.

 Options:

  * Change the database schema to have fewer tables and joins
  * Locate the join limit and change it

 The second option is clearly retarded.  Moreover, the limit is set by
 SQLite, not Alchemy, so it “can’t” be done.

 We tried changing the schema and discovered that the one we have is quite
 elegant and we run into much greater problems if we use anything
 different.  Besides, changing the schema will most likely only give a
 constant multiplicative benefit; i.e. we can’t do infinitely many joins
 without changing the query mechanism.

************************************************************************************************

In high school I was naive enough to think that programmers didn’t need to know much math. Boy was I wrong. But Jeff says that the problem might not occur in Postgres… but we know there are portals using SQLite.

What to do?

I’ve made this a low priority problem and will move on to test the system starting next week. Luke is also testing, but since I know the ticket system like the back of my hand, I should also test. I will also create a screen cast.

There are some bugs that need to be fixed (e.g. tags, email notification preferences, etc), and some CSS that should be made pretty, but I’m comfortable handing all of this off to anyone else.

Ringing the Death Knell

July 9, 2008

As of today, Wednesday, July 9th, all the large components of the new ticketing system have seen light. The ticketing system is ready to move from core development into pre-QA and QA.

What needs to be done:

  • There is a join cap set by Alchemy of 32 joins. Because of the very hefty new database schema, this can easily be reached with just 8 ticket queries. Therefore, the join mechanism in the TicketQuery needs to be upgraded. This is actually the highest priority item right now.
  • Add the Dojo font-end for creating, deleting, and changing fields. This will be done by Jeff and is probably a small step in the development process despite the fact that it is central to the system
  • Integrate the Dojo upgrade into newt
  • There needs to be extensive functional testing
  • The unit tests should be run to ensure that newt fails exactly when trunk fails and for the same reasons

Bug issues seen so far:

  • The field name and the field preview on the Ticket New and Ticket View pages should be wrapped and scrollable, not overflow the table
  • Tags do not work for new tickets
  • Users cannot change email notification defaults in project preferences
  • The ticket query sub-system needs major testing

Enhancements I foresee in the near future:

  • The kid templates make judicious use of CSS, but there are still a few tables used (namely when assignig field values); perhaps layout can be implemented with CSS, not tables.
  • There is currently no option to view the cross-project ticket query results in other formats (e.g. comma-delimited)
  • The tags box on the new ticket page is awkwardly placed and will probably be changed by someone
  • Many unexpected errors can be raised in the myriad of sub-components and only DrProject errors (and some others) are handled by the request object majestically. Perhaps the req object will be retrofitted to catch any unexpected errors (I anticipate mainly parse errors) and warn the user without crashing the portal
  • A brief code review would be useful
  • We use JSON to communicate between JavaScript and Python. This will likely be changed to something else until JSON is adopted. This is not so hard since the use of JSON is *highly* localized.

In the past half-a-week, I’ve made great strides in making the code much more modular and readable, and I’ve tried to anticipate security concerns as wells future extension. Security really doesn’t seem to be a big issue (or maybe I’m just not well versed enough in the field).

Enhancements I forsee in the distant future:

  • Cross-project fields. Fields that are, not only named the same across projects, but actually are the same in the database. Probably useful for cross-project ticket queries. I have made no effort to support this.
  • Cross-project ticket queries can be filtered by a very minute set of fields since fields cannot be assumed to be the same across multiple projects. This may be changed in the future by playing some database tricks. Changes here are either easy or they’re not; I can’t anticipate what will be done and so I haven’t made any effort to this end.
  • The ticket query sub-system and its associated controller permit somewhat narrow query formulations. The user can select a field, then select an operation (*is*, for example), then she can select any number of possible values. Notce that she cannot query a particular field in more than one mode (e.g. A *is* x _or_ A *is not* y). Moreover, there was never any need for AND in the old system (almost), but in the new system AND can be very useful (e.g. A *starts with* x _and_ A *ends with* y). This is currently not possible without boolean logic “tricks”. Changes of this nature should not be difficult given the new modularity of the query sub-system.
  • A user is currently notified about a ticket creation/change if she is mentioned in any of the fields of type user for that ticket. This may change so that a user can choose which field changes she would be liked to notified about. I’ve have made the system extensible to this end.
  • The fate of the ‘All’ project and the ‘Annon’ & ‘Nobody’ users are not known and the system may need to change to accomodate this. I’ve made the code extensible to this end.

I’ve also made a great effort to separate ‘model’, ‘view’ and ‘controller’. Recently I moved all the business logic out of the controllers and the databse schemas into the ticket and milestone APIs. “All the business logic” has a subjective conotation of course. I’ve removed all the mapper extensions and have moved user input validation into the controllers. I’ve moved GUI stuff into the kid templates and the CSS files. I feel the system is much better segmented now into logical divisions based on purpose.

Update

July 1, 2008

I haven’t posted in a while since I’ve been rushing to meet my July 18 deadline.

All I have to do is complete the ticket query; both in the project-specific mode and the cross-project mode. I have 2.5 weeks, but even that might not be enough.

After this is finished, most everything will be done except that project managers still need an interface to add, remove, rename, reorder, etc ticket fields. But this comes under the jurisdiction of Jeff’s form editor.

A lot of the DrP screens have fallen under my jurisdiction, but I’ve made an effort make a lot of them look the same in the new system as they id in the old. So even though my code touches a huge portion of DrP, the screens that are affected number about 5.

Here’s a look at all the screens that I’ve changed…

Creating a new ticket

View / Edit Existing Ticket

Ticket Notifications

Milestone Statistics



Project Statistics