18 Commits

Author SHA1 Message Date
Nick Burch
62f07a8661 Complete initial Tika-ification of the metadata extractor
The remaining extractors to be converted to Tika now have been, tests have
 been included for the image metadata extraction, and some extension points
 for future extractors have been created.


git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@20669 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2010-06-16 16:19:38 +00:00
Nick Burch
0e19812dbc Tika for metadata extraction
Convert some more metadata extractors to using Tika, and enable the use of 
 the Tika auto-detection parser on any documents without an explicitly
 defined extractor.


git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@20667 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2010-06-16 14:09:46 +00:00
Nick Burch
63b2f5983a Tika for metadata extraction
First pass of converting a few extractors to use Tika rather than 3rd party libraries directly, or use the new style tika structure


git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@20640 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2010-06-14 19:02:37 +00:00
Paul Holmes-Higgin
cefda8c965 Updated header to LGPL
git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@18931 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2010-03-01 22:48:39 +00:00
Paul Holmes-Higgin
43e93f3c14 Updated header to LGPL
git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@18926 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2010-03-01 22:09:17 +00:00
Nick Burch
bd1e3edf76 Update metadata extractors - Outlook, MP3, Mail and PDF improvements, and increase test coverage
git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@18454 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2010-02-04 14:42:45 +00:00
Derek Hulley
f03f95325a Upgraded OpenDocumentMetadataExtracter to new infrastructure.
Added more OpenDocument test documents.


git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@5690 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2007-05-16 10:27:36 +00:00
Derek Hulley
0e51d23b29 Fix AR-487: Extraction of raw metadata is no seperate from the mapping to system properties.
Part fix AR-357: The OfficeMetadataExtracter has been ported, but needs a few more properties added to the raw set


git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@5677 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2007-05-15 08:48:07 +00:00
Derek Hulley
0c10d61a48 Merged V2.0 to HEAD
svn merge svn://svn.alfresco.com:3691/alfresco/BRANCHES/V2.0@5141 svn://svn.alfresco.com:3691/alfresco/BRANCHES/V2.0@51352 .
      - FLOSS
      - Some files will need a follow-up
         -root/projects/repository/source/java/org/alfresco/repo/avm/wf/AVMRemoveWFStoreHandler.java (not yet on HEAD: 5094)
         -root/projects/repository/source/java/org/alfresco/filesys/server/state/FileStateLockManager.java (not yet on HEAD: 5093)
         -onContentUpdateRecord (not on HEAD)


git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@5167 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2007-02-16 06:44:46 +00:00
Paul Holmes-Higgin
31c250682b Changed licence headers
git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@5081 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2007-02-08 18:59:58 +00:00
Derek Hulley
595556f3c5 Merged V1.3 to HEAD(3161:3179)
svn merge svn://www.alfresco.org:3691/alfresco/BRANCHES/V1.3@3161 svn://www.alfresco.org:3691/alfresco/BRANCHES/V1.3@3179 .


git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@3406 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2006-07-26 10:49:21 +00:00
Kevin Roast
5a513ea900 corrected copyright and author
git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@3394 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2006-07-25 09:23:36 +00:00
Kevin Roast
e31e027039 . Outlook email format meta-data extractor
- expects .msg files in native Outlook format
  - uses POI library for the parsing of the horrid OLE2 compound document format
  - extracts addressee(s), sent date and originator email address
  ...for the future - could be modified and used as a transformer to allow full-text indexing of Outlook format emails

. Add new aspect "emailed" to the contentmodel to support properties for above extractor

git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@3387 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2006-07-24 15:05:48 +00:00
Derek Hulley
349183a535 Beefed up unit tests for content metadata extracters
git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@2469 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2006-02-22 13:16:56 +00:00
Derek Hulley
31d9ef768b Inverted configuration of Metadata Extracters
- Adding an extracter no longer requires modification to the MetadataExtracterRegistry
Fixed lack of stream closures

git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@2465 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2006-02-22 11:11:53 +00:00
Derek Hulley
c7afc8286e Fixed closing of input stream
git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@2428 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2006-02-17 12:24:09 +00:00
Kevin Roast
cafe3fb51f . Fixes/improvements for handling of author/creator in the repository and the web-client:
- added new aspect called "cm:author" with a single text property "cm:author"
 - fixed the content meta-data extractors to set the new cm:author property rather than the system cm:creator property (which was causing a couple of bugs spotted recently)
 - fixed the web-client to set the new cm:author property rather than the cm:creator property from user entered data into the UI
 - fixed web-client config of document properties screen to display cm:author
 - fixed client to not allow editing of the cm:creator value

git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@2034 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2005-12-13 15:25:56 +00:00
Derek Hulley
e1e6508fec Moving to root below branch label
git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@2005 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2005-12-08 07:13:07 +00:00